Brand new computer language describes organismal traits to create computable species descriptions

Describing traits with Phenoscript is like programming a computer code for how an organism looks.

The beetle species Grebennikovius basilewskyi. Numbers next to arrows indicate patterns of phenotype statements explained in the section “Phenoscript: main patterns of phenotype statements”. Arrow numbers from T1 to T5 illustrate individual body parts. See more in the research study.

One of the most beautiful aspects of Nature is the endless variety of shapes, colours and behaviours exhibited by organisms. These traits help organisms survive and find mates, like how a male peacock’s colourful tail attracts females or his wings allow him to fly away from danger. Understanding traits is crucial for biologists, who study them to learn how organisms evolve and adapt to different environments.

To do this, scientists first need to describe these traits in words, like saying a peacock’s tail is “vibrant, iridescent, and ornate”. This approach works for small studies, but when looking at hundreds or even millions of different animals or plants, it’s impossible for the human brain to keep track of everything.

Computers could help, but not even the latest AI technology is able to grasp human language to the extent needed by biologists. This hampers research significantly because, although scientists can handle large volumes of DNA data, linking this information to physical traits is still very difficult.

To solve this problem, researchers from the Finnish Museum of Natural History, Giulio Montanaro and Sergei Tarasov, along with collaborators, have created a special language called Phenoscript. This language is designed to describe traits in a way that both humans and computers can understand. Describing traits with Phenoscript is like programming a computer code for how an organism looks.

Phenoscript uses something called semantic technology, which helps computers understand the meaning behind words, much like how modern search engines know the difference between the fruit “apple” and the tech company “Apple” based on the context of your search.

“This language is still being tested, but it shows a lot of promise. As more scientists start using Phenoscript, it will revolutionise biology by making vast amounts of trait data available for large-scale studies, boosting the emerging field of phenomics,”

explains Montanaro.

In their research article, newly published in the open-access, peer-reviewed Biodiversity Data Journal, the researchers make use of the new language for the first time, as they create semantic phenotypes for four species of dung beetles from the genus Grebennikovius. Then, to demonstrate the power of the semantic approach, they apply simple semantic queries to the generated phenotypic descriptions. 

Finally, the team takes a look yet further ahead into modernising the way scientists work with species information. Their next aim is to integrate semantic species descriptions with the concept of nanopublications, “which encapsulates discrete pieces of information into a comprehensive knowledge graph”. As a result, data that has become part of this graph can be queried directly, thereby ensuring that it remains Findable, Accessible, Interoperable and Reusable (FAIR) through a variety of semantic resources.

***

Research paper:

Montanaro G, Balhoff JP, Girón JC, Söderholm M, Tarasov S (2024) Computable species descriptions and nanopublications: applying ontology-based technologies to dung beetles (Coleoptera, Scarabaeinae). Biodiversity Data Journal 12: e121562. https://doi.org/10.3897/BDJ.12.e121562

***

The hereby study is the latest addition to the special topical collection: “Linking FAIR biodiversity data through publications: The BiCIKL approach”, launched and supported by the recently concluded Horizon 2020 project: Biodiversity Community Integrated Knowledge Library (BiCIKL). The collection aims to bring together scientific publications that demonstrate the advantages and novel approaches in accessing and (re-)using linked biodiversity data.

***

What expert recommendations did the BiCIKL consortium give to policy makers and research funders to ensure that biodiversity data is FAIR, linked, open and, indeed, future-proof? Find out in the blog post summarising key lessons learnt from the Horizon 2020 project.

***

Follow Biodiversity Data Journal on Facebook and X.

Pensoft collaborates with R Discovery to elevate research discoverability

Pensoft and R Discovery’s innovative connection aims to change the way researchers find academic articles.

Leading scholarly publisher Pensoft has announced a strategic collaboration with R Discovery, the AI-powered research discovery platform by Cactus Communications, a renowned science communications and technology company. This partnership aims to revolutionize the accessibility and discoverability of research articles published by Pensoft, making them more readily available on R Discovery to its over three million researchers across the globe.

R Discovery, acclaimed for its advanced algorithms and an extensive database boasting over 120 million scholarly articles, empowers researchers with intelligent search capabilities and personalized recommendations. Through its innovative Reading Feed feature, R Discovery delivers tailored suggestions in a format reminiscent of social media, identifying articles based on individual research interests. This not only saves time but also keeps researchers updated with the latest and most relevant studies in their field.

Open Science is much more than cost-free access to research output.

Lyubomir Penev

One of R Discovery’s standout features is its ability to provide paper summaries, audio readings, and language translation, enabling users to quickly assess a paper’s relevance and enhance their research reading experience significantly.

With over 2.5 million app downloads and upwards of 80 million journal articles featured, the R Discovery database is one of the largest scholarly content repositories.

At Pensoft, we do realise that Open Science is much more than cost-free access to research outputs. It is also about easier discoverability and reusability, or, in other words, how likely it is for the reader to come across a particular scientific publication and, as a result, cite and build on those findings in his/her own studies. By feeding the content of our journals into R Discovery, we’re further facilitating the discoverability of the research done and shared by the authors who trust us with their work,” said ARPHA’s and Pensoft’s founder and CEO Prof. Lyubomir Penev.

Abhishek Goel, Co-Founder and CEO of Cactus Communications, commented on the collaboration, “We are delighted to work with Pensoft and offer researchers easy access to the publisher’s high-quality research articles on R Discovery. This is a milestone in our quest to support academia in advancing open science that can help researchers improve the world.

So far, R Discovery has successfully established partnership with over 20 publishers, enhancing the platform’s extensive repository of scholarly content. By joining forces with R Discovery, Pensoft solidifies its dedication to making scholarly publications from its open-access, peer-reviewed journal portfolio easily discoverable and accessible.

Beware of scientific scams! Tips to avoid predatory publishing in biological journals

Predatory publishing has been growing exponentially, with severe consequences for society and the environment.

Guest blog post by Cássio Cardoso Pereira, Gabriela França Fernandes, and Walisson Kenedy Siqueira

We are bombarded day and night with slot-machine invitations from journals, books, and events such as congresses and lectures. Predatory publishing has reached alarming levels in biology, which is why we published an editorial in the journal Neotropical Biology and Conservation to alert the community, show the modus operandi of these publishers, and pass on good practices so that researchers, especially beginners, can escape this trap.

Piggybacking on the open access movement, numerous predatory publishers have emerged in search of easy profits. These cybercriminals take advantage of the publish-or-perish culture without providing any information about their peer-review protocols, concerned not with the scientific, bibliographic, or ethical aspects of publishing, but with the money received from authors.

The number of predatory publishers has grown exponentially in recent years and spread across all areas of knowledge, including biology. It is a common practice of these journals, often with an equally fake editorial staff, to send electronic invitations to potential authors to publish articles. These invitations are often facilitated by initial screenings of the emails of corresponding authors available on the internet. The emailed invitations from the supposed editors often stress that the author’s work is sound and, since it has already gone through the scrutiny of the editorial board, requires only the payment of a fee to publish it, with no need for further peer review.

Invitations to join the editorial board of these journals are also frequent, mostly intended to take advantage of the scientists’ prestige. Instead of editing articles, these invited editors are used as poster boys, i.e., they have their names published on the journal’s website, thus attracting unsuspecting authors to submit their manuscripts.

These journals are generally not included in the directory of open access journals (DOAJ) and are not indexed in the main bibliometric databases, such as Google Scholar, SciELO, Scopus, and Web of Science, for the simple reason that they do not meet their inclusion criteria. The websites of these journals often have little information about the editorial board, have a fake International Standard Serial Number (ISSN), lack transparency regarding their scope, provide no indication of a policy of retraction, have no transparency regarding copyright transfer, and provide very vague contact information, often omitting the address of the journal’s office.

In addition to papers, there are also invitations to publish books and book chapters with fake International Standard Book Numbers and dubious editorial boards. There is also a flood of invitations to predatory meetings, such as online conferences, symposia, workshops, and lectures. These often have websites that are equally confusing and never linked to a university or a postgraduate program. Above all, one should consult advisors, supervisors, or senior colleagues about the invitation and the sender’s academic reputation. In any case, one must pay attention not only to the citation metrics but also, mainly, to their editorial board, ISSN, ISBN, contact information, and relationships with recognized institutions.

When we analyze the impacts of predatory publishing on the scientific community, the worst problems are:

  • the dissemination of erroneous information about scientific problems of interest
  • the facilitation of plagiarism
  • the waste of public resources intended for publication
  • the appointment of researchers at universities and research institutes based on curricula full of doubtful publications, generating negative cascading effects that undermine higher education as a whole.

The damage done to society can be even worse. Governments, large companies, and decision-makers can be misled by false information, resulting in attitudes that undermine responses to major human problems such as climate change, biodiversity, and pandemics.

Efforts to fight predatory publishers require collaboration and support at higher levels. Governments need to create regulatory agencies that carefully and systematically evaluate the activities carried out by scientific journals. Science funding agencies should require that publication fees be paid only to publishers that adhere to an internationally recognized set of transparency and ethical rules. We need to discuss our values and incentives in the academic community, so we can start prioritizing quality over quantity. This would provide a reference point for research, help design coherent interventions, and improve information and public policy in favor of society and the environment.

Reference:

Pereira CC, Mello MAR, Negreiros D, Figueiredo JCG, Kenedy-Siqueira W, Maia LR, Fernandes S, Fernandes GFC, Ponce de Leon A, Ashworth L, Oki Y, de Castro GC, Aguilar R, Fearnside PM, Fernandes GW (2023) Beware of scientific scams! Hints to avoid predatory publishing in biological journals. Neotropical Biology and Conservation 18(2): 97-105. https://doi.org/10.3897/neotropical.18.e108887

Interoperable biodiversity data extracted from literature through open-ended queries

OpenBiodiv is a biodiversity database containing knowledge extracted from scientific literature, built as an Open Biodiversity Knowledge Management System. 

The OpenBiodiv contribution to BiCIKL

Apart from coordinating the Horizon 2020-funded project BiCIKL, scholarly publisher and technology provider Pensoft has been the engine behind what is likely to be the first production-stage semantic system to run on top of a reasonably-sized biodiversity knowledge graph.

OpenBiodiv is a biodiversity database containing knowledge extracted from scientific literature, built as an Open Biodiversity Knowledge Management System. 

As of February 2023, OpenBiodiv contains 36,308 processed articles; 69,596 taxon treatments; 1,131 institutions; 460,475 taxon names; 87,876 sequences; 247,023 bibliographic references; 341,594 author names; and 2,770,357 article sections and subsections.

In fact, OpenBiodiv is a whole ecosystem comprising tools and services that enable biodiversity data to be extracted from the text of biodiversity articles published in data-minable XML format, as in the journals published by Pensoft (e.g. ZooKeys, PhytoKeys, MycoKeys, Biodiversity Data Journal), and other taxonomic treatments – available from Plazi and Plazi’s specialised extraction workflow – into Linked Open Data.

“I believe that OpenBiodiv is a good real-life example of how the outputs and efforts of a research project may and should outlive the duration of the project itself. Something that is – of course – central to our mission at BiCIKL.”

explains Prof Lyubomir Penev, BiCIKL’s Project Coordinator and founder and CEO of Pensoft.

“The basics of what was to become the OpenBiodiv database began to come together back in 2015 within the EU-funded BIG4 PhD project of Victor Senderov, later succeeded by another PhD project by Mariya Dimitrova within IGNITE. It was during those two projects that the backend Ontology-O, the first versions of RDF converters and the basic website functionalities were created,”

he adds.

At the time OpenBiodiv became one of the nine research infrastructures within BiCIKL tasked with the provision of virtual access to open FAIR data, tools and services, it had already evolved into a RDF-based biodiversity knowledge graph, equipped with a fully automated extraction and indexing workflow and user apps.

Currently, Pensoft is working at full speed on new user apps in OpenBiodiv, as the team is continuously bringing into play invaluable feedback and recommendation from end-users and partners at BiCIKL. 

As a result, OpenBiodiv is already capable of answering open-ended queries based on the available data. To do this, OpenBiodiv discovers ‘hidden’ links between data classes, i.e. taxon names, taxon treatments, specimens, sequences, persons/authors and collections/institutions. 

Thus, the system generates new knowledge about taxa, scientific articles and their subsections, the examined materials and their metadata, localities and sequences, amongst others. Additionally, it is able to return information with a relevant visual representation about any one or a combination of those major data classes within a certain scope and semantic context.

Users can explore the database by either typing in any term (even if misspelt!) in the search engine available from the OpenBiodiv homepage; or integrating an Application Programming Interface (API); as well as by using SPARQL queries.

On the OpenBiodiv website, there is also a list of predefined SPARQL queries, which is continuously being expanded.

Sample of predefined SPARQL queries at OpenBiodiv.

“OpenBiodiv is an ambitious project of ours, and it’s surely one close to Pensoft’s heart, given our decades-long dedication to biodiversity science and knowledge sharing. Our previous fruitful partnerships with Plazi, BIG4 and IGNITE, as well as the current exciting and inspirational network of BiCIKL are wonderful examples of how far we can go with the right collaborators,”

concludes Prof Lyubomir Penev.

***

Follow BiCIKL on Twitter and Facebook. Join the conversation on Twitter at #BiCIKL_H2020.

You can also follow Pensoft on Twitter, Facebook and Linkedin and use #OpenBiodiv on Twitter.

Journal publishing platform ARPHA partners with content recommendation engine TrendMD

Thanks to the new collaboration between content recommendation engine TrendMD and journal publishing platform ARPHA, readers of all journals under Pensoft’s imprint, as well as those using the white-label publishing solution provided by the platform, will be given a useful list of recommended articles related to the study they are reading. The new widget is to save the users a great amount of time, by simply pointing them to the most relevant papers on the topic from across a constantly expanding network of of peer-reviewed articles and research news.

While nearly 8,000 new scholarly articles are published each day, it is basically impossible staying up-to-date with the news from a single scientific field, let alone doing cross-disciplinary research. Furthermore, sifting out the quality literature is another painstaking activity no academic is looking forward to. Hence, TrendMD comes as the sensible solution to help a reader find the most relevant and fine studies on a particular topic. The widget’s recommendations are based on the topic a user is currently reading, what papers they have read in the past, and the articles others with similar interests have sought out – all available from the most authoritative and quality journals in the world.

“TrendMD is excited to welcome Pensoft, a highly innovative, open access, online publishing platform, to the TrendMD network! This partnership will bring over 5,000 open access articles and books in the field of natural history, predominantly taxonomy and organismal biology, to TrendMD’s ever expanding network,” says Paul Kudlow, CEO and co-founder of TrendMD.

“In our continuing effort to develop and implement the most novel tools and workflows in academic publishing, at Pensoft we are pleased to have integrated our journal publishing platform ARPHA with the new-age scholarly innovation that is TrendMD’s tool, so that our readers have an easy and constant access to the most relevant and best-quality research,” says Pensoft’s CEO and founder Prof. Lyubomir Penev.