Image recognition to the rescue of natural history museums by enabling curators to identify specimens on the fly

New Research Idea, published in RIO Journal presents a promising machine-learning ecosystem to unite experts around the world and make up for lacking taxonomic expertise.

In their Research Idea, published in Research Ideas and Outcomes (RIO Journal), Swiss-Dutch research team present a promising machine-learning ecosystem to unite experts around the world and make up for lacking expert staff

Guest blog post by Luc Willemse, Senior collection manager at Naturalis Biodiversity Centre (Leiden, Netherlands)

Imagine the workday of a curator in a national natural history museum. Having spent several decades learning about a specific subgroup of grasshoppers, that person is now busy working on the identification and organisation of the holdings of the institution. To do this, the curator needs to study in detail a huge number of undescribed grasshoppers collected from all sorts of habitats around the world. 

The problem here, however, is that a curator at a smaller natural history institution – is usually responsible for all insects kept at the museum, ranging from butterflies to beetles, flies and so on. In total, we know of around 1 million described insect species worldwide. Meanwhile, another 3,000 are being added each year, while many more are redescribed, as a result of further study and new discoveries. Becoming a specialist for grasshoppers was already a laborious activity that took decades, how about knowing all insects of the world? That’s simply impossible. 

Then, how could we expect from one person to sort and update all collections at a museum: an activity that is the cornerstone of biodiversity research? A part of the solution, hiring and training additional staff, is costly and time-consuming, especially when we know that experts on certain species groups are already scarce on a global scale. 

We believe that automated image recognition holds the key to reliable and sustainable practises at natural history institutions. 

Today, image recognition tools integrated in mobile apps are already being used even by citizen scientists to identify plants and animals in the field. Based on an image taken by a smartphone, those tools identify specimens on the fly and estimate the accuracy of their results. What’s more is the fact that those identifications have proven to be almost as accurate as those done by humans. This gives us hope that we could help curators at museums worldwide take better and more timely care of the collections they are responsible for. 

However, specimen identification for the use of natural history institutions is still much more complex than the tools used in the field. After all, the information they store and should be able to provide is meant to serve as a knowledge hub for educational and reference purposes for present and future generations of researchers around the globe.

This is why we propose a sustainable system where images, knowledge, trained recognition models and tools are exchanged between institutes, and where an international collaboration between museums from all sizes is crucial. The aim is to have a system that will benefit the entire community of natural history collections in providing further access to their invaluable collections. 

We propose four elements to this system: 

  1. A central library of already trained image recognition models (algorithms) needs to be created. It will be openly accessible, so any other institute can profit from models trained by others.
Mock-up of a Central Library of Algorithms.
  1. A central library of datasets accessing images of collection specimens that have recently been identified by experts. This will provide an indispensable source of images for training new algorithms.
Mock-up of a Central Library of Datasets.
  1. A digital workbench that provides an easy-to-use interface for inexperienced users to customise the algorithms and datasets to the particular needs in their own collections. 
  2. As the entire system depends on international collaboration as well as sharing of algorithms and datasets, a user forum is essential to discuss issues, coordinate, evaluate, test or implement novel technologies.

How would this work on a daily basis for curators? We provide two examples of use cases.

First, let’s zoom in to a case where a curator needs to identify a box of insects, for example bush crickets, to a lower taxonomic level. Here, he/she would take an image of the box and split it into segments of individual specimens. Then, image recognition will identify the bush crickets to a lower taxonomic level. The result, which we present in the table below – will be used to update object-level registration or to physically rearrange specimens into more accurate boxes. This entire step can also be done by non-specialist staff. 

Mock-up of box with grasshoppers mentioned in the above table

Results of automated image recognition identify specimens to a lower taxonomic level.

Another example is to incorporate image recognition tools into digitisation processes that include imaging specimens. In this case, image recognition tools can be used on the fly to check or confirm the identifications and thus improve data quality.

Mock-up of an interface for automated taxon identification. 

Using image recognition tools to identify specimens in museum collections is likely to become common practice in the future. It is a technical tool that will enable the community to share available taxonomic expertise. 

Using image recognition tools creates the possibility to identify species groups for which there is very limited to none in-house expertise. Such practises would substantially reduce costs and time spent per treated item. 

Image recognition applications carry metadata like version numbers and/or datasets used for training. Additionally, such an approach would make identification more transparent than the one carried out by humans whose expertise is, by design, in no way standardised or transparent.

*

Follow RIO Journal on Twitter and Facebook.

*

Research publication:

Greeff M, Caspers M, Kalkman V, Willemse L, Sunderland BD, Bánki O, Hogeweg L (2022) Sharing taxonomic expertise between natural history collections using image recognition. Research Ideas and Outcomes 8: e79187. https://doi.org/10.3897/rio.8.e79187

Austrian-Danish research team discover as many as 22 new moth species from across Europe

The last time so many previously unknown moths have been discovered at once in the best-studied continent was in 1887

One of the newly discovered moths, Megacraspedus faunierensis, in its natural habitat in the Alps.

Following a long-year study of the family of twirler moths, an Austrian-Danish research team discovered a startling total of 44 new species, including as many as 22 species inhabiting various regions throughout Europe.

Given that the Old Continent is the most thoroughly researched one, their findings, published in the open access journal ZooKeys, pose fundamental questions about our knowledge of biodiversity. Such wealth of new to science European moths has not been published within a single research article since 1887.

“The scale of newly discovered moths in one of the Earth’s most studied regions is both sensational and completely unexpected,” say authors Dr Peter Huemer, Tyrolean State Museum, and Ole Karsholt of the University of Copenhagen‘s Zoological Museum. To them, the new species come as proof that, “despite dramatic declines in many insect populations, our fundamental investigations into species diversity are still far from complete”.

 

The challenge of taxonomy

Type locality of the new moth species Megacraspedus faunierensis, Cottian Alps, Italy.

For the authors, it all began when they spotted what seemed like an unclassifiable species of twirler moth in the South Tyrolean Alps. In order to confirm it as a new species, the team conducted a 5-year study into the type specimens of all related species spread across the museum collections of Paris, London, Budapest and many in between.

To confirm the status of all new species, the scientists did not only look for characteristic colouration, markings and anatomical features, but also used the latest DNA methods to create unique genetic fingerprints for most of the species in the form of DNA barcodes.

 

What’s in a name?

A particular challenge for the researchers was to choose as many as 44 names for the new species. Eventually, they named one of the species after the daughter of one of the authors, others – after colleagues and many others – after the regions associated with the particular species. Megacraspedus teriolensis, for example, is translated to “Tyrolean twirler moth”.

Amongst the others, there is one which the scientists named Megacraspedus feminensisbecause they could only find the female, while another – Megacraspedus pacificus, discovered in Afghanistan – was dubbed “an ambassador of peace”.

 

Mysterious large twirler moths

One of the newly discovered moths, Megacraspedus faunierensis, in its natural habitat in the Alps.

All new moths belong to the genus of the large twirler moths (Megacraspedus) placed in the family of twirler moths (Gelechiidae), where the common name refers to their protruding modified mouthparts (labial palps).

The genus of the large twirler moths presents an especially interesting group because of their relatively short wings, where their wingspan ranges between 8 and 26 millimetres and the females are often flightless. While it remains unknown why exactly their wings are so reduced, the scientists assume that it is most likely an adaptation to the turbulent winds at their high-elevation habitats, since the species prefer mountain areas at up to 3,000 metres above sea level.

Out of the 85 documented species, however, both sexes are known in only 35 cases.

The scientists suspect that many of the flightless females are hard to spot on the ground. Similarly, caterpillars of only three species have been observed to date.

While one of the few things we currently know about the large twirler moths is that all species live on different grasses, Huemer and Karsholt believe that it is of urgent importance to conduct further research into the biology of these insects, in order to identify their conservation status and take adequate measures towards their preservation.

###

Original source:

Huemer P, Karsholt O (2018) Revision of the genus Megacraspedus Zeller, 1839, a challenging taxonomic tightrope of species delimitation (Lepidoptera, Gelechiidae). ZooKeys 800: 1-278. https://doi.org/10.3897/zookeys.800.26292

Advanced computer technology & software turn species identification interactive

Important group of biocontrol wasps from Central Europe are used to demonstrate the perks and advantages of modern, free-to-use software

Representing a group of successful biocontrol agents for various pest fruit flies, a parasitic wasp genus remains largely overlooked. While its most recent identification key dates back to 1969, many new species have been added since then. As if to make matters worse, this group of visually identical species most likely contains many species yet to be described as new to science.

Having recently studied a species group of these wasps in Central Europe, scientists Fabian Klimmek and Hannes Baur of the Natural History Museum Bern, Switzerland, not only demonstrate the need for a knowledge update, but also showcase the advantages of modern taxonomic software able to analyse large amounts of descriptive and quantitative data.

Published in the open access Biodiversity Data Journal, the team’s taxonomic paper describes a new species – Pteromalus capito – and presents a discussion on the free-to-use Xper3, developed by the Laboratory of Informatics and Systematics of Pierre-and-Marie-Curie University. The software was used to create an openly available updated key for the species group Pteromalus albipennis.

The fully illustrated interactive database covers 27 species in the group and 18 related species, in addition to a complete diagnosis, a large set of body measurements and a total of 585 images, displaying most of the characteristic features for each species.

“Nowadays, advanced computer technology, measurement procedures and equipment allow more sophisticated ways to include quantitative characters, which greatly enhance the delimitation of cryptic species,” explain the scientists.

“Recently developed software for the creation of biological identification keys like Xper3, Lucid or Delta could have the potential to replace traditional paper-based keys.”

To put the statement into context, the authors give an example with one of the studied wasp species, whose identification would take 16 steps if the previously available identification key were used, whereas only 6 steps were needed with the interactive alternative.

One of the reasons tools like Xper3 are so fast and efficient is that the key’s author can list all descriptive characters in a specific order and give them different weight in species delimitation. Thus, whenever an entomologist tries to identify a wasp specimen, the software will first run a check against the descriptors at the top, so that it can exclude non-matching taxons and provide a list of the remaining names. Whenever multiple names remain, a check further down the list is performed, until there is a single one left, which ought to be the one corresponding to the specimen. At any point, the researcher can access the chronology, in order to check for any potential mismatches without interrupting the process.

Being the product of digitally available software, interactive identification keys are not only easy, quick and inexpensive to publish, but they are also simple to edit and build on in a collaborative manner. Experts from all around the world could update the key, as long as the author grants them specific user rights. However, regardless of how many times the database is updated, a permanent URL link will continue to provide access to the latest version at all times.

To future-proof their key and its underlying data, the scientists have deposited all raw data files, R-scripts, photographs, files listing and prepared specimens at the research data Zenodo, created by OpenAIRE and CERN.

###

Original source:

Klimmek F, Baur H (2018) An interactive key to Central European species of the Pteromalus albipennis species group and other species of the genus (Hymenoptera: Chalcidoidea: Pteromalidae), with the description of a new species. Biodiversity Data Journal 6: e27722. https://doi.org/10.3897/BDJ.6.e27722

Audit finds biodiversity data aggregators ‘lose and confuse’ data

In an effort to improve the quality of biodiversity records, the Atlas of Living Australia (ALA) and the Global Biodiversity Information Facility (GBIF) use automated data processing to check individual data items. The records are provided to the ALA and GBIF by museums, herbaria and other biodiversity data sources.

However, an independent analysis of such records reports that ALA and GBIF data processing also leads to data loss and unjustified changes in scientific names.

The study was carried out by Dr Robert Mesibov, an Australian millipede specialist who also works as a data auditor. Dr Mesibov checked around 800,000 records retrieved from the Australian MuseumMuseums Victoria and the New Zealand Arthropod Collection. His results are published in the open access journal ZooKeys, and also archived in a public data repository.

“I was mainly interested in changes made by the aggregators to the genus and species names in the records,” said Dr Mesibov.

“I found that names in up to 1 in 5 records were changed, often because the aggregator couldn’t find the name in the look-up table it used.”

data_auditAnother worrying result concerned type specimens – the reference specimens upon which scientific names are based. On a number of occasions, the aggregators were found to have replaced the name of a type specimen with a name tied to an entirely different type specimen.

The biggest surprise, according to Dr Mesibov, was the major disagreement on names between aggregators.

“There was very little agreement,” he explained. “One aggregator would change a name and the other wouldn’t, or would change it in a different way.”

Furthermore, dates, names and locality information were sometimes lost from records, mainly due to programming errors in the software used by aggregators to check data items. In some data fields the loss reached 100%, with no original data items surviving the processing.

“The lesson from this audit is that biodiversity data aggregation isn’t harmless,” said Dr Mesibov. “It can lose and confuse perfectly good data.”

“Users of aggregated data should always download both original and processed data items, and should check for data loss or modification, and for replacement of names,” he concluded.

###

Original source:

Mesibov R (2018) An audit of some filtering effects in aggregated occurrence records. ZooKeys 751: 129-146. https://doi.org/10.3897/zookeys.751.24791