New Research Idea, published in RIO Journal presents a promising machine-learning ecosystem to unite experts around the world and make up for lacking taxonomic expertise.
In their Research Idea, published in Research Ideas and Outcomes (RIO Journal), Swiss-Dutch research team present a promising machine-learning ecosystem to unite experts around the world and make up for lacking expert staff
Guest blog post by Luc Willemse, Senior collection manager at Naturalis Biodiversity Centre (Leiden, Netherlands)
Imagine the workday of a curator in a national natural history museum. Having spent several decades learning about a specific subgroup of grasshoppers, that person is now busy working on the identification and organisation of the holdings of the institution. To do this, the curator needs to study in detail a huge number of undescribed grasshoppers collected from all sorts of habitats around the world.
The problem here, however, is that a curator at a smaller natural history institution – is usually responsible for all insects kept at the museum, ranging from butterflies to beetles, flies and so on. In total, we know of around 1 million described insect species worldwide. Meanwhile, another 3,000 are being added each year, while many more are redescribed, as a result of further study and new discoveries. Becoming a specialist for grasshoppers was already a laborious activity that took decades, how about knowing all insects of the world? That’s simply impossible.
Then, how could we expect from one person to sort and update all collections at a museum: an activity that is the cornerstone of biodiversity research? A part of the solution, hiring and training additional staff, is costly and time-consuming, especially when we know that experts on certain species groups are already scarce on a global scale.
We believe that automated image recognition holds the key to reliable and sustainable practises at natural history institutions.
Today, image recognition tools integrated in mobile apps are already being used even by citizen scientists to identify plants and animals in the field. Based on an image taken by a smartphone, those tools identify specimens on the fly and estimate the accuracy of their results. What’s more is the fact that those identifications have proven to be almost as accurate as those done by humans. This gives us hope that we could help curators at museums worldwide take better and more timely care of the collections they are responsible for.
However, specimen identification for the use of natural history institutions is still much more complex than the tools used in the field. After all, the information they store and should be able to provide is meant to serve as a knowledge hub for educational and reference purposes for present and future generations of researchers around the globe.
This is why we propose a sustainable system where images, knowledge, trained recognition models and tools are exchanged between institutes, and where an international collaboration between museums from all sizes is crucial. The aim is to have a system that will benefit the entire community of natural history collections in providing further access to their invaluable collections.
We propose four elements to this system:
A central library of already trained image recognition models (algorithms) needs to be created. It will be openly accessible, so any other institute can profit from models trained by others.
Mock-up of a Central Library of Algorithms.
A central library of datasets accessing images of collection specimens that have recently been identified by experts. This will provide an indispensable source of images for training new algorithms.
Mock-up of a Central Library of Datasets.
A digital workbench that provides an easy-to-use interface for inexperienced users to customise the algorithms and datasets to the particular needs in their own collections.
As the entire system depends on international collaboration as well as sharing of algorithms and datasets, a user forum is essential to discuss issues, coordinate, evaluate, test or implement novel technologies.
How would this work on a daily basis for curators? We provide two examples of use cases.
First, let’s zoom in to a case where a curator needs to identify a box of insects, for example bush crickets, to a lower taxonomic level. Here, he/she would take an image of the box and split it into segments of individual specimens. Then, image recognition will identify the bush crickets to a lower taxonomic level. The result, which we present in the table below – will be used to update object-level registration or to physically rearrange specimens into more accurate boxes. This entire step can also be done by non-specialist staff.
Mock-up of box with grasshoppers mentioned in the above table
Results of automated image recognition identify specimens to a lower taxonomic level.
Another example is to incorporate image recognition tools into digitisation processes that include imaging specimens. In this case, image recognition tools can be used on the fly to check or confirm the identifications and thus improve data quality.
Mock-up of an interface for automated taxon identification.
Using image recognition tools to identify specimens in museum collections is likely to become common practice in the future. It is a technical tool that will enable the community to share available taxonomic expertise.
Using image recognition tools creates the possibility to identify species groups for which there is very limited to none in-house expertise. Such practises would substantially reduce costs and time spent per treated item.
Image recognition applications carry metadata like version numbers and/or datasets used for training. Additionally, such an approach would make identification more transparent than the one carried out by humans whose expertise is, by design, in no way standardised or transparent.
Greeff M, Caspers M, Kalkman V, Willemse L, Sunderland BD, Bánki O, Hogeweg L (2022) Sharing taxonomic expertise between natural history collections using image recognition. Research Ideas and Outcomes 8: e79187. https://doi.org/10.3897/rio.8.e79187
In times of exacerbating biodiversity loss, reliable data on species occurrence are essential. Environmental DNA (eDNA) – DNA released from organisms into the water – is increasingly used to detect fishes in biodiversity monitoring campaigns. However, eDNA turns out to be capable of providing much more than fish occurrence data, including information on other vertebrates. A study, published in the open-access journal Metabarcoding and Metagenomics, demonstrates how comprehensively vertebrate diversity can be assessed at no additional costs.
Revolutionary environmental DNA analysis holds great potential for the future of biodiversity monitoring, concludes a new study
Collection of water samples for eDNA metabarcoding bioassessment. Photo by Till-Hendrik Macher.
In times of exacerbating biodiversity loss, reliable data on species occurrence are essential, in order for prompt and adequate conservation actions to be initiated. This is especially true for freshwater ecosystems, which are particularly vulnerable and threatened by anthropogenic impacts. Their ecological status has already been highlighted as a top priority by multiple national and international directives, such as the European Water Framework Directive.
However, traditional monitoring methods, such as electrofishing, trapping methods, or observation-based assessments, which are the current status-quo in fish monitoring, are often time- and cost-consuming. As a result, over the last decade, scientists progressively agree that we need a more comprehensive and holistic method to assess freshwater biodiversity.
Meanwhile, recent studies have continuously been demonstrating that eDNA metabarcoding analyses, where DNA traces found in the water are used to identify what organisms live there, is an efficient method to capture aquatic biodiversity in a fast, reliable, non-invasive and relatively low-cost manner. In such metabarcoding studies, scientists sample, collect and sequence DNA, so that they can compare it with existing databases and identify the source organisms.
Furthermore, as eDNA metabarcoding assessments use samples from water, often streams, located at the lowest point, one such sample usually contains not only traces of specimens that come into direct contact with water, for example, by swimming or drinking, but also collects traces of terrestrial species indirectly via rainfalls, snowmelt, groundwaters etc.
In standard fish eDNA metabarcoding assessments, these ‘bycatch data’ are typically left aside. Yet, from a viewpoint of a more holistic biodiversity monitoring, they hold immense potential to also detect the presence of terrestrial and semi-terrestrial species in the catchment.
In their new study, reported in the open-access scholarly journalMetabarcoding and Metagenomics, German researchers from the University of Duisburg-Essen and the German Environment Agency successfully detected an astonishing quantity of the local mammals and birds native to the Saxony-Anhalt state by collecting as much as 18 litres of water from across a two-kilometre stretch along the river Mulde.
After water filtration the eDNA filter is preserved in ethanol until further processing in the lab. Photo by Till-Hendrik Macher.
In fact, it took only one day for the team, led by Till-Hendrik Macher, PhD student in the German Federal Environmental Agency-funded GeDNA project, to collect the samples. Using metabarcoding to analyse the DNA from the samples, the researchers identified as much as 50% of the fishes, 22% of the mammal species, and 7.4% of the breeding bird species in the region.
However, the team also concluded that while it would normally take only 10 litres of water to assess the aquatic and semi-terrestrial fauna, terrestrial species required significantly more sampling.
Unlocking data from the increasingly available fish eDNA metabarcoding information enables synergies among terrestrial and aquatic biodiversity monitoring programs, adding further important information on species diversity in space and time.
“We thus encourage to exploit fish eDNA metabarcoding biodiversity monitoring data to inform other conservation programs,”
says lead author Till-Hendrik Macher.
“For that purpose, however, it is essential that eDNA data is jointly stored and accessible for different biodiversity monitoring and biodiversity assessment campaigns, either at state, federal, or international level,”
concludes Florian Leese, who coordinates the project.
Original source:
Macher T-H, Schütz R, Arle J, Beermann AJ, Koschorreck J, Leese F (2021) Beyond fish eDNA metabarcoding: Field replicates disproportionately improve the detection of stream associated vertebrate species. Metabarcoding and Metagenomics 5: e66557. https://doi.org/10.3897/mbmg.5.66557
A new species of tiny cave snail that glistens in the light and has a muffin-top-like bulge, was discovered by Marina Ferrand of the French Club Etude et Exploration des Gouffres et Carrières (EEGC), during the Phouhin Namno caving expedition in Tham Houey Yè cave in Laos in March 2019. The new species, named Laoennea renouardi was described in the open-access, peer-reviewed journal Subterranean Biology.
Tham Houey Yè cave (Vientiane Province, Laos), inhabited by the newly discovered “muffin-topped” snail species Laoennea renouardi. Photo by Jean-Francois Fabriol.
A new species of tiny cave snail that glistens in the light and has a muffin-top-like bulge, was discovered by Marina Ferrand of the French Club Etude et Exploration des Gouffres et Carrières (EEGC), during the Phouhin Namno caving expedition in Tham Houey Yè cave in Laos in March 2019. The new species, Laoennea renouardi, is 1.80 mm tall and is named after the French caver,Louis Renouard, who explored and mapped the only two caves in Laos known to harbor this group of tiny snails. Only two species of Laoennea snail are known so far, L. carychioides and now, L. renouardi.
The new transparent “muffin-topped” snail, Laoennea renouardi. Photo by Estée Bochud.
“The discovery and description of biodiversity before it disappears is a major priority for biologists worldwide. The caves in Laos are still largely underexplored and the snails known from them remain few in number,”
points out Dr. Jochum.
The fact that two species of tiny cave snails of the same group were found in two caves located in two independent karstic networks 3.4 km apart, caused the authors to question evolutionary processes in these underground hotspots of biodiversity. The authors hypothesise that the two caves might have been connected during the Quaternary, around 100–200 thousand years ago. In time, the river Yè might have formed a barrier, thus disconnecting the cave systems and separating the populations. As a result, the snails evolved into two different species.
A new species of tiny cave snail that glistens in the light and has a muffin-top-like bulge, was discovered by Marina Ferrand of the French Club Etude et Exploration des Gouffres et Carrie?res (EEGC), during the Phouhin Namno caving expedition in Tham Houey Yè cave in Laos in March 2019. The new species, Laoennea renouardi, is 1.80 mm tall and is named after the French caver, Louis Renouard, who explored and mapped the only two caves in Laos known to harbor this group of tiny snails. Only two species of Laoennea snail are known so far, L. carychioides and now, L. renouardi.
Map of the two caves on opposite sides of the River Yè, Vientiane Province, Laos. Image by Louis Renouard.
The fact that two species of tiny cave snails of the same group were found in two caves located in two independent karstic networks 3.4 km apart, caused the authors to question evolutionary processes in these underground hotspots of biodiversity. The authors hypothesise that the two caves might have been connected during the Quaternary, around 100-200 thousand years ago. In time, the river Yè might have formed a barrier, thus disconnecting the cave systems and separating the populations. As a result, the snails evolved into two different species.
***
Original Source:
Jochum A, Bochud E, Favre A, Ferrand M, Wackenheim Q (2020) A new species of Laoennea microsnail (Stylommatophora, Diapheridae) from a cave in Laos. Subterranean Biology 36: 1-9. https://doi.org/10.3897/subtbiol.36.58977
by Mariya Dimitrova, Jorrit Poelen, Georgi Zhelezov, Teodor Georgiev, Lyubomir Penev
Fig. 1. Pensoft-GloBI workflow for indexing biotic interactions from scholarly literature
Tables published in scholarly literature are a rich source of primary biodiversity data. They are often used for communicating species occurrence data, morphological characteristics of specimens, links of species or specimens to particular genes, ecology data and biotic interactions between species, etc. Tables provide a structured format for sharing numerous facts about biodiversity in a concise and clear way.
Inspired by the potential use of semantically-enhanced tables for text and data mining, Pensoft and Global Biotic Interactions (GloBI) developed a workflow for extracting and indexing biotic interactions from tables published in scholarly literature. GloBI is an open infrastructure enabling the discovery and sharing of species interaction data. GloBI ingests and accumulates individual datasets containing biotic interactions and standardises them by mapping them to community-accepted ontologies, vocabularies and taxonomies. Data integrated by GloBI is accessible through an application programming interface (API) and as archives in different formats (e.g. n-quads). GloBI has indexed millions of species interactions from hundreds of existing datasets spanning over a hundred thousand taxa.
The workflow
First, all tables extracted from Pensoft publications and stored in the OpenBiodiv triple store were automatically retrieved (Step 1 in Fig. 1). There were 6993 tables from 21 different journals. To identify only the tables containing biotic interactions, we used an ontology annotator, currently developed by Pensoft using terms from the OBO Relation Ontology (RO). The Pensoft Annotator analyses free text and finds words and phrases matching ontology term labels.
We used the RO to create a custom ontology, or list of terms, describing different biotic interactions (e.g. ‘host of’, ‘parasite of’, ‘pollinates’) (Step 2 in Fig. 1).. We used all subproperties of the RO term labeled ‘biotically interacts with’ and expanded the list of terms with additional word spellings and variations (e.g. ‘hostof’, ‘host’) which were added to the custom ontology as synonyms of already existing terms using the property oboInOwl:hasExactSynonym.
This custom ontology was used to perform annotation of all tables via the Pensoft Annotator (Step 3 in Fig. 1). Tables were split into rows and columns and accompanying table metadata (captions). Each of these elements was then processed through the Pensoft Annotator and if a match from the custom ontology was found, the resulting annotation was written to a MongoDB database, together with the article metadata. The original table in XML format, containing marked-up taxa, was also stored in the records.
Thus, we detected 233 tables which contain biotic interactions, constituting about 3.4% of all examined tables. The scripts used for parsing the tables and annotating them, together with the custom ontology, are open source and available on GitHub. The database records were exported as json to a GitHub repository, from where they could be accessed by GloBI.
GloBI processed the tables further, involving the generation of a table citation from the article metadata and the extraction of interactions between species from the table rows (Step 4 in Fig. 1). Table citations were generated by querying the OpenBiodiv database with the DOI of the article containing each table to obtain the author list, article title, journal name and publication year. The extraction of table contents was not a straightforward process because tables do not follow a single schema and can contain both merged rows and columns (signified using the ‘rowspan’ and ‘colspan’ attributes in the XML). GloBI were able to index such tables by duplicating rows and columns where needed to be able to extract the biotic interactions within them. Taxonomic name markup allowed GloBI to identify the taxonomic names of species participating in the interactions. However, the underlying interaction could not be established for each table without introducing false positives due to the complicated table structures which do not specify the directionality of the interaction. Hence, for now, interactions are only of the type ‘biotically interacts with’ (Fig. 2) because it is a bi-directional one (e.g. ‘Species A interacts with Species B’ is equivalent to ‘Species B interacts with Species A’).
Fig. 2. Example of a biotic interaction indexed by GloBI.
Examples of species interactions provided by OpenBiodiv and indexed by GloBI are available on GloBI’s website.
In the future we plan to expand the capacity of the workflow to recognise interaction types in more detail. This could be implemented by applying part of speech tagging to establish the subject and object of an interaction.
In addition to being accessible via an API and as archives, biotic interactions indexed by GloBI are available as Linked Open Data and can be accessed via a SPARQL endpoint. Hence, we plan on creating a user-friendly service for federated querying of GloBI and OpenBiodiv biodiversity data.
This collaborative project is an example of the benefits of open and FAIR data, enabling the enhancement of biodiversity data through the integration between Pensoft and GloBI. Transformation of knowledge contained in existing scholarly works into giant, searchable knowledge graphs increases the visibility and attributed re-use of scientific publications.
Tables published in scholarly literature are a rich source of primary biodiversity data. They are often used for communicating species occurrence data, morphological characteristics of specimens, links of species or specimens to particular genes, ecology data and biotic interactions between species etc. Tables provide a structured format for sharing numerous facts about biodiversity in a concise and clear way.
Inspired by the potential use of semantically-enhanced tables for text and data mining, Pensoft and Global Biotic Interactions (GloBI) developed a workflow for extracting and indexing biotic interactions from tables published in scholarly literature. GloBI is an open infrastructure enabling the discovery and sharing of species interaction data. GloBI ingests and accumulates individual datasets containing biotic interactions and standardises them by mapping them to community-accepted ontologies, vocabularies and taxonomies. Data integrated by GloBI is accessible through an application programming interface (API) and as archives in different formats (e.g. n-quads). GloBI has indexed millions of species interactions from hundreds of existing datasets spanning over a hundred thousand taxa.
The workflow
First, all tables extracted from Pensoft publications and stored in the OpenBiodiv triple store were automatically retrieved (Step 1 in Fig. 1). There were 6,993 tables from 21 different journals. To identify only the tables containing biotic interactions, we used an ontology annotator, currently developed by Pensoft using terms from the OBO Relation Ontology (RO). The Pensoft Annotator analyses free text and finds words and phrases matching ontology term labels.
We used the RO to create a custom ontology, or list of terms, describing different biotic interactions (e.g. ‘host of’, ‘parasite of’, ‘pollinates’) (Step 1 in Fig. 1). We used all subproperties of the RO term labeled ‘biotically interacts with’ and expanded the list of terms with additional word spellings and variations (e.g. ‘hostof’, ‘host’) which were added to the custom ontology as synonyms of already existing terms using the property oboInOwl:hasExactSynonym.
This custom ontology was used to perform annotation of all tables via the Pensoft Annotator (Step 3 in Fig. 1). Tables were split into rows and columns and accompanying table metadata (captions). Each of these elements was then processed through the Pensoft Annotator and if a match from the custom ontology was found, the resulting annotation was written to a MongoDB database, together with the article metadata. The original table in XML format, containing marked-up taxa, was also stored in the records.
Thus, we detected 233 tables which contain biotic interactions, constituting about 3.4% of all examined tables. The scripts used for parsing the tables and annotating them, together with the custom ontology, are open source and available on GitHub. The database records were exported as JSON to a GitHub repository, from where they could be accessed by GloBI.
GloBI processed the tables further, involving the generation of a table citation from the article metadata and the extraction of interactions between species from the table rows (Step 4 in Fig. 1). Table citations were generated by querying the OpenBiodiv database with the DOI of the article containing each table to obtain the author list, article title, journal name and publication year. The extraction of table contents was not a straightforward process because tables do not follow a single schema and can contain both merged rows and columns (signified using the ‘rowspan’ and ‘colspan’ attributes in the XML). GloBI were able to index such tables by duplicating rows and columns where needed to be able to extract the biotic interactions within them. Taxonomic name markup allowed GloBI to identify the taxonomic names of species participating in the interactions. However, the underlying interaction could not be established for each table without introducing false positives due to the complicated table structures which do not specify the directionality of the interaction. Hence, for now, interactions are only of the type ‘biotically interacts with’ because it is a bi-directional one (e.g. ‘Species A interacts with Species B’ is equivalent to ‘Species B interacts with Species A’).
In the future, we plan to expand the capacity of the workflow to recognise interaction types in more detail. This could be implemented by applying part of speech tagging to establish the subject and object of an interaction.
In addition to being accessible via an API and as archives, biotic interactions indexed by GloBI are available as Linked Open Data and can be accessed via a SPARQL endpoint. Hence, we plan on creating a user-friendly service for federated querying of GloBI and OpenBiodiv biodiversity data.
This collaborative project is an example of the benefits of open and FAIR data, enabling the enhancement of biodiversity data through the integration between Pensoft and GloBI. Transformation of knowledge contained in existing scholarly works into giant, searchable knowledge graphs increases the visibility and attributed re-use of scientific publications.
References
Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.
Additional Information
The work has been partially supported by the International Training Network (ITN) IGNITE funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 764840.
The researchers compared wild bee communities in the tropical dry forest of Mexico living in three habitat conditions: preserved vegetation, agricultural sites and urbanised areas
Changes in land use negatively affect bee species richness and diversity, and cause major shifts in species composition, reports a recent study of native wild bees, conducted at the Sierra de Quila Flora and Fauna Protection Area and its influence zone in Mexico.
Having registered a total of 14,054 individual bees representing 160 species, 52 genera, and five families over the span of a year, the scientists conclude that the studied preserved areas demonstrated “significantly greater” richness and diversity.
In their paper, published in the open-access Journal of Hymenoptera Research, a research team from the University of Guadalajara, Mexico, led by Alejandro Muñoz-Urias, compare three conditions within the tropical dry forest study site: preserved vegetation, an agricultural area with crops and livestock, and an urbanised area.
This bee species (Aztecanthidium xochipillium) is known exclusively from Mexico.
The researchers confirm earlier information that an increase in anthropogenic disturbances leads to a decrease in bee richness and diversity. While availability of food and nesting sites are the key factors for bee communities, changes in land use negatively impact flower richness and floral diversity. Thereby, turning habitats into urbanised or agricultural sites significantly diminishes the populations of the bees which rely on specific plants for nectar and pollen. These are the species whose populations are threatened with severe declines up to the point of local extinction.
According to their data, about half of the bees recorded were Western honey bees (49.9%), whereas polyester bees turned out to be the least abundant (1.2 %).
On the other hand, some generalist bees, which feed on a wide range of plants, seem to thrive in urbanised areas, as they take advantage of people watering wild and ornamental plants at times where draughts might be eradicating native vegetation.
“That is the reason why bees that can use a wide variety of resources are often able to compensate when circumstances change, although some species disappear due to land use changes,” explain the scientists.
This is a tropical dry forest in the dry (left) and rainy season (right).
In conclusion, the authors recommend that the tropical dry forests of both the study area and Mexico in general need to be protected in order for these essential pollinators to be conserved.
“Pollinators are a key component for global biodiversity, because they assist in the sexual reproduction of many plant species and play a crucial role in maintaining terrestrial ecosystems and food security for human beings,” they remind.
###
Original source:
Razo-León AE, Vásquez-Bolaños M, Muñoz-Urias A, Huerta-Martínez FM (2018) Changes in bee community structure (Hymenoptera, Apoidea) under three different land-use conditions. Journal of Hymenoptera Research 66: 23-38. https://doi.org/10.3897/jhr.66.27367