Image recognition to the rescue of natural history museums by enabling curators to identify specimens on the fly

New Research Idea, published in RIO Journal presents a promising machine-learning ecosystem to unite experts around the world and make up for lacking taxonomic expertise.

In their Research Idea, published in Research Ideas and Outcomes (RIO Journal), Swiss-Dutch research team present a promising machine-learning ecosystem to unite experts around the world and make up for lacking expert staff

Guest blog post by Luc Willemse, Senior collection manager at Naturalis Biodiversity Centre (Leiden, Netherlands)

Imagine the workday of a curator in a national natural history museum. Having spent several decades learning about a specific subgroup of grasshoppers, that person is now busy working on the identification and organisation of the holdings of the institution. To do this, the curator needs to study in detail a huge number of undescribed grasshoppers collected from all sorts of habitats around the world. 

The problem here, however, is that a curator at a smaller natural history institution – is usually responsible for all insects kept at the museum, ranging from butterflies to beetles, flies and so on. In total, we know of around 1 million described insect species worldwide. Meanwhile, another 3,000 are being added each year, while many more are redescribed, as a result of further study and new discoveries. Becoming a specialist for grasshoppers was already a laborious activity that took decades, how about knowing all insects of the world? That’s simply impossible. 

Then, how could we expect from one person to sort and update all collections at a museum: an activity that is the cornerstone of biodiversity research? A part of the solution, hiring and training additional staff, is costly and time-consuming, especially when we know that experts on certain species groups are already scarce on a global scale. 

We believe that automated image recognition holds the key to reliable and sustainable practises at natural history institutions. 

Today, image recognition tools integrated in mobile apps are already being used even by citizen scientists to identify plants and animals in the field. Based on an image taken by a smartphone, those tools identify specimens on the fly and estimate the accuracy of their results. What’s more is the fact that those identifications have proven to be almost as accurate as those done by humans. This gives us hope that we could help curators at museums worldwide take better and more timely care of the collections they are responsible for. 

However, specimen identification for the use of natural history institutions is still much more complex than the tools used in the field. After all, the information they store and should be able to provide is meant to serve as a knowledge hub for educational and reference purposes for present and future generations of researchers around the globe.

This is why we propose a sustainable system where images, knowledge, trained recognition models and tools are exchanged between institutes, and where an international collaboration between museums from all sizes is crucial. The aim is to have a system that will benefit the entire community of natural history collections in providing further access to their invaluable collections. 

We propose four elements to this system: 

  1. A central library of already trained image recognition models (algorithms) needs to be created. It will be openly accessible, so any other institute can profit from models trained by others.
Mock-up of a Central Library of Algorithms.
  1. A central library of datasets accessing images of collection specimens that have recently been identified by experts. This will provide an indispensable source of images for training new algorithms.
Mock-up of a Central Library of Datasets.
  1. A digital workbench that provides an easy-to-use interface for inexperienced users to customise the algorithms and datasets to the particular needs in their own collections. 
  2. As the entire system depends on international collaboration as well as sharing of algorithms and datasets, a user forum is essential to discuss issues, coordinate, evaluate, test or implement novel technologies.

How would this work on a daily basis for curators? We provide two examples of use cases.

First, let’s zoom in to a case where a curator needs to identify a box of insects, for example bush crickets, to a lower taxonomic level. Here, he/she would take an image of the box and split it into segments of individual specimens. Then, image recognition will identify the bush crickets to a lower taxonomic level. The result, which we present in the table below – will be used to update object-level registration or to physically rearrange specimens into more accurate boxes. This entire step can also be done by non-specialist staff. 

Mock-up of box with grasshoppers mentioned in the above table

Results of automated image recognition identify specimens to a lower taxonomic level.

Another example is to incorporate image recognition tools into digitisation processes that include imaging specimens. In this case, image recognition tools can be used on the fly to check or confirm the identifications and thus improve data quality.

Mock-up of an interface for automated taxon identification. 

Using image recognition tools to identify specimens in museum collections is likely to become common practice in the future. It is a technical tool that will enable the community to share available taxonomic expertise. 

Using image recognition tools creates the possibility to identify species groups for which there is very limited to none in-house expertise. Such practises would substantially reduce costs and time spent per treated item. 

Image recognition applications carry metadata like version numbers and/or datasets used for training. Additionally, such an approach would make identification more transparent than the one carried out by humans whose expertise is, by design, in no way standardised or transparent.

*

Follow RIO Journal on Twitter and Facebook.

*

Research publication:

Greeff M, Caspers M, Kalkman V, Willemse L, Sunderland BD, Bánki O, Hogeweg L (2022) Sharing taxonomic expertise between natural history collections using image recognition. Research Ideas and Outcomes 8: e79187. https://doi.org/10.3897/rio.8.e79187

Natural History Museum of Berlin’s journal Fossil Record started publishing on ARPHA Platform

Fossil Record – the paleontological scholarly journal of the Natural History Museum of Berlin (Museum für Naturkunde Berlin) published its first articles after moving to the academic publisher Pensoft and its publishing platform ARPHA Platform in late 2021. The renowned scientific outlet – launched in 1998 – joined two other historical journals owned by the Museum: Deutsche Entomologische Zeitschrift and Zoosystematics and Evolution, which moved to Pensoft back in 2014.

Fossil Record – the paleontological scholarly journal of the Natural History Museum of Berlin (Museum für Naturkunde Berlin) published its first articles after moving to the academic publisher Pensoft and its publishing platform ARPHA in late 2021. The renowned scientific outlet – launched in 1998 – joined two other historical journals owned by the Museum: Deutsche Entomologische Zeitschrift and Zoosystematics and Evolution, which moved to Pensoft back in 2014.

Published in two issues a year, the open-access scientific outlet covers research from all areas of palaeontology, including the taxonomy and systematics of fossil organisms, biostratigraphy, palaeoecology, and evolution. It deals with all taxonomic groups, including invertebrates, microfossils, plants, and vertebrates.

As a result of the move to ARPHA, Fossil Record utilises the whole package of ARPHA Platform’s services, including its fast-track, end-to-end publishing module, designed to appeal to readers, authors, reviewers and editors alike. A major advantage is that the whole editorial process, starting from the submission of a manuscript and continuing into peer review, editing, publication, dissemination, archiving and hosting, happens within the online ecosystem of ARPHA. 

As soon as they are published, the articles in Fossil Record are available in three formats: PDF, machine-readable JATS XML and semantically enriched HTML for better and mobile-friendly reader experience. 

The publications are equipped with real-time metrics on both article and sub-article level that allow easy access to the number of visitors, views and downloads for every article and each of it’s figures, tables or supplementary materials. In their turn, the semantic enhancements do not only allow for easy navigation throughout the text and quick access to cited literature and the article’s own citations, but also tag each taxon that appears in the paper to provide links to further information concerning its occurrences, genomics, nomenclature, treatments and more as available from various databases.      

The first five papers – now available on the brand new journal website powered by ARPHA – already demonstrate the breadth of topics covered by Fossil Record, including systematics, paleobiogeography, palaeodiversity and morphology, as well as the international appeal of the scholarly outlet. The articles are co-authored by collaborative research teams representing ten countries and spanning three continents: Europe, Asia and Africa.

***

About the Natural History Museum of Berlin:

The “Museum für Naturkunde – Leibniz Institute for Evolution and Biodiversity Science” is an integrated research museum within the Leibniz Association. It is one of the most important research institutions worldwide in the areas of biological and geological evolution and biodiversity.

The Museum’s mission is to discover and describe life and earth – with people, through dialogue. As an excellent research museum and innovative communication platform, it wants to engage with and influence the scientific and societal discourse about the future of our planet, worldwide. Its vision, strategy and structure make the museum an excellent research museum. The Natural History Museum of Berlin has research partners in Berlin, Germany and approximately 60 other countries. Over 700,000 visitors per year as well as steadily increasing participation in educational and other events show that the Museum has become an innovative communication centre that helps shape the scientific and social dialogue about the future of our earth. 

Digitising the Natural History Museum London’s entire collection could contribute over £2 billion to the global economy

In a world first, the Natural History Museum, London, has collaborated with economic consultants, Frontier Economics Ltd, to explore the economic and societal value of digitising natural history collections and concluded that digitisation has the potential to see a seven to tenfold return on investment. Whilst significant progress is already being made at the Museum, additional investment is needed in order to unlock the full potential of the Museum’s vast collections – more than 80 million objects. The project’s report is published in the open science scientific journal Research Ideas and Outcomes (RIO Journal).

One of the Museum’s digitisers imaging a butterfly to join the 4.93 million specimens already available online. 
© The Trustees of the Natural History Museum, London

The societal benefits of digitising natural history collections extends to global advancements in food security, biodiversity conservation, medicine discovery, minerals exploration, and beyond. Brand new, rigorous economic report predicts investing in digitising natural history museum collections could also result in a tenfold return. The Natural History Museum, London, has so far made over 4.9 million digitised specimens available freely online – over 28 billion records have been downloaded over 429,000 download events over the past six years. 

Digitisation at the Natural History Museum, London 

Digitisation is the process of creating and sharing the data associated with Museum specimens. To digitise a specimen, all its related information is added to an online database. This typically includes where and when it was collected and who found it, and can include photographs, scans and other molecular data if available. Natural history collections are a unique record of biodiversity dating back hundreds of years, and geodiversity dating back millennia. Creating and sharing data this way enables science that would have otherwise been impossible, and we accelerate the rate at which important discoveries are made from our collections.  

The Natural History Museum’s collection of 80 million items is one of the largest and most historically and geographically diverse in the world. By unlocking the collection online, the Museum provides free and open access for global researchers, scientists, artists and more. Since 2015, the Museum has made 4.9 million specimens available on the Museum’s Data Portal, which have seen more than 28 billion downloads over 427,000 download events. 

This means the Museum has digitised  about 6% of its collections to date. Because digitisation is expensive, costing tens of millions of pounds, it is difficult to make a case for further investment without better understanding the value of this digitisation and its benefits. 

In 2021, the Museum decided to explore the economic impacts of collections data in more depth, and commissioned Frontier Economics to undertake modelling, resulting in this project report, now made publicly available in the open-science journal Research Ideas and Outcomes (RIO Journal), and confirming benefits in excess of £2 billion over 30 years. While the methods in this report are relevant to collections globally, this modelling focuses on benefits to the UK, and is intended to support the Museum’s own digitisation work, as well as a current scoping study funded by the Arts & Humanities Research Council about the case for digitising all UK natural science collections as a research infrastructure.

Sharing data from our collections can transform scientific research and help find solutions for nature and from nature. Our digitised collections have helped establish the baseline plant biodiversity in the Amazon, find wheat crops that are more resilient to climate change and support research into potential zoonotic origins of Covid-19. The research that comes from sharing our specimens has immense potential to transform our world and help both people and the planet thrive,

says Helen Hardy, Science Digital Programme Manager at the Natural History Museum.

How digitisation impacts scientific research?

The data from museum collections accelerates scientific research, which in turn creates benefits for society and the economy across a wide range of sectors. Frontier Economics Ltd have looked at the impact of collections data in five of these sectors: biodiversity conservation, invasive species, medicines discovery, agricultural research and development and mineral exploration. 

The Natural History Museum’s collection is a real treasure trove which, if made easily accessible to scientists all over the world through digitisation, has the potential to unlock ground-breaking research in any number of areas. Predicting exactly how the data will be used in future is clearly very uncertain. We have looked at the potential value that new research could create in just five areas focussing on a relatively narrow set of outcomes. We find that the value at stake is extremely large, running into billions,”

says Dan Popov, Economist at Frontier Economics Ltd.

The new analyses attempt to estimate the economic value of these benefits using a range of approaches, with the results in broad agreement that the benefits of digitisation are at least ten times greater than the costs. This represents a compelling case for investment in museum digital infrastructure without which the many benefits will not be realised.

This new analysis shows that the data locked up in our collections has significant societal and economic value, but we need investment to help us release it,

adds Professor Ken Norris, Head of the Life Sciences Department at the Natural History Museum.

Other benefits could include improvements to the resilience of agricultural crops by better understanding their wild relatives, research into invasive species which can cause significant damage to ecosystems and crops, and improving the accuracy of mining.  

Finally, there are other impacts that such work could have on how science is conducted itself. The very act of digitising specimens means that researchers anywhere on the planet can access these collections, saving time and money that may have been spent as scientists travelled to see specific objects.

The value of research enabled by digitisation of natural history collections can be estimated by looking at specific areas where the Museum’s collections contribute towards scientific research and subsequently impact the wider economy. 
© Frontier Economics Ltd.

Original source: 

Popov D, Roychoudhury P, Hardy H, Livermore L, Norris K (2021) The Value of Digitising Natural History Collections. Research Ideas and Outcomes 7: e78844. https://doi.org/10.3897/rio.7.e78844

New BiCIKL project to build a freeway between pieces of biodiversity knowledge

Within Biodiversity Community Integrated Knowledge Library (BiCIKL), 14 key research and natural history institutions commit to link infrastructures and technologies to provide flawless access to biodiversity data.

In a recently started Horizon 2020-funded project, 14 European institutions from 10 countries, representing both the continent’s and global key players in biodiversity research and natural history, deploy and improve their own and partnering infrastructures to bridge gaps between each other’s biodiversity data types and classes. By linking their technologies, they are set to provide flawless access to data across all stages of the research cycle.

Three years in, BiCIKL (abbreviation for Biodiversity Community Integrated Knowledge Library) will have created the first-of-its-kind Biodiversity Knowledge Hub, where a researcher will be able to retrieve a full set of linked and open biodiversity data, thereby accessing the complete story behind an organism of interest: its name, genetics, occurrences, natural history, as well as authors and publications mentioning any of those.

Ultimately, the project’s products will solidify Open Science and FAIR (Findable, Accessible, Interoperable and Reusable) data practices by empowering and streamlining biodiversity research.

Together, the project partners will redesign the way biodiversity data is found, linked, integrated and re-used across the research cycle. By the end of the project, BiCIKL will provide the community with a more transparent, trustworthy and efficient highly automated research ecosystem, allowing for scientists to access, explore and put into further use a wide range of data with only a few clicks.

“In recent years, we’ve made huge progress on how biodiversity data is located, accessed, shared, extracted and preserved, thanks to a vast array of digital platforms, tools and projects looking after the different types of data, such as natural history specimens, species descriptions, images, occurrence records and genomics data, to name a few. However, we’re still missing an interconnected and user-friendly environment to pull all those pieces of knowledge together. Within BiCIKL, we all agree that it’s only after we puzzle out how to best bridge our existing infrastructures and the information they are continuously sourcing that future researchers will be able to realise their full potential,” 

explains BiCIKL’s project coordinator Prof. Lyubomir Penev, CEO and founder of Pensoft, a scholarly publisher and technology provider company.

Continuously fed with data sourced by the partnering institutions and their infrastructures, BiCIKL’s key final output: the Biodiversity Knowledge Hub, is set to persist with time long after the project has concluded. On the contrary, by accelerating biodiversity research that builds on – rather than duplicates – existing knowledge, it will in fact be providing access to exponentially growing contextualised biodiversity data.

***

Learn more about BiCIKL on the project’s website at: bicikl-project.eu

Follow BiCIKL Project on Twitter and Facebook. Join the conversation on Twitter at #BiCIKL_H2020.

***

The project partners:

48 years of Australian collecting trips in one data package

From 1973 to 2020, Australian zoologist Dr Robert Mesibov kept careful records of the “where” and “when” of his plant and invertebrate collecting trips. Now, he has made those valuable biodiversity data freely and easily accessible via the Zenodo open-data repository, so that future researchers can rely on this “authority file” when using museum specimens collected from those events in their own studies. The new dataset is described in the open-access, peer-reviewed Biodiversity Data Journal.

While checking museum records, Dr Robert Mesibov found there were occasional errors in the dates and places for specimens he had collected many years before. He was not surprised.

“It’s easy to make mistakes when entering data on a computer from paper specimen labels”, said Mesibov. “I also found specimen records that said I was the collector, but I know I wasn’t!”

One solution to this problem was what librarians and others have long called an “authority file”.

“It’s an authoritative reference, in this case with the correct details of where I collected and when”, he explained.

“I kept records of almost all my collecting trips from 1973 until I retired from field work in 2020. The earliest records were on paper, but I began storing the key details in digital form in the 1990s.”

The 48-year record has now been made publicly available via the Zenodo open-data repository after conversion to the Darwin Core data format, which is widely used for sharing biodiversity information. With this “authority file”, described in detail in the open-access, peer-reviewed Biodiversity Data Journal, future researchers will be able to rely on sound, interoperable and easy to access data, when using those museum specimens in their own studies, instead of repeating and further spreading unintentional errors.

“There are 3829 collecting events in the authority file”, said Mesibov, “from six Australian states and territories. For each collecting event there are geospatial and date details, plus notes on the collection.”

Mesibov hopes the authority file will be used by museums to correct errors in their catalogues.

“It should also save museums a fair bit of work in future”, he explained. “No need to transcribe details on specimen labels into digital form in a database, because the details are already in digital form in the authority file.”

Mesibov points out that in the 19th and 20th centuries, lists of collecting events were often included in the reports of major scientific expeditions.

“Those lists were authority files, but in the pre-digital days it was probably just as easy to copy collection data from specimen labels.”

“In the 21st century there’s a big push to digitise museum specimen collections”, he said. “Museum databases often have lookup tables with scientific names and the names of collectors. These lookup tables save data entry time and help to avoid errors in digitising.”

“Authority files for collecting events are the next logical step,” said Mesibov. “They can be used as lookup tables for all the important details of individual collections: where, when, by whom and how.”

###

Research paper:

Mesibov RE (2021) An Australian collector’s authority file, 1973–2020. Biodiversity Data Journal 9: e70463. https://doi.org/10.3897/BDJ.9.e70463

###

Robert Mesibov’s webpage: https://www.datafix.com.au/mesibov.html

Robert Mesibov’s ORCID page: https://orcid.org/0000-0003-3466-5038

Journal of Hymenoptera Research links Crocodile Dundee, Toblerone, Game of Thrones & Alien

A myriad of species and genera new to science, including economically important wasps drawing immediate attention because of their amusing names and remarkable physical characters, in addition to work set to lay the foundations for future taxonomic and conservation research, together comprise the latest 64th issue of Journal of Hymenoptera Research (JHR).

The species Qrocodiledundee outbackense

Two genera (Qrocodiledundee and Tobleronius) named after the action comedy Crocodile Dundee and the chocolate brand Toblerone are only a couple of the 14 new genera from the monograph of the microgastrine wasps of the world’s tropical regions, authored by Dr Jose Fernandez-Triana and Caroline Boudreault of the Canadian National Collection of insects in Ottawa. In their article, the team also describes a total of 29 new species, where five of them carry the names of institutions holding some of the most outstanding wasp collections.

Another curiously named species of microgastrine wasp described in the new JHR issue, is called Eadya daenerys in reference to Daenerys Targaryen, a fictional character known from the best-selling book series A Song of Ice and Fire by George R. R. Martin, and the blockbuster TV show Game of Thrones. Discovered by University of Central Florida‘s Ryan Ridenbaugh, Erin Barbeau and Dr Barbara Sharanowski as a result of a collaboration between biocontrol researchers and taxonomists, the new species might not be in control of three dragons, nor a ruler or protector of whole nations. However, by being a potential biocontrol agent against a particular group of leaf beetle pests, it could spare the lives of many eucalyptus plantations around the world.

The species Tobleronius orientalis

Furthermore, a wasp named Dolichogenidea xenomorph, which parasitises other eucalyptus pests, is also named after a character from a sought-after franchise. The scriptwriters of the horror sci-fi movie series Alien are thought to have been thinking of parasitic wasps when they came up with the character Xenomorph, remind authors Erinn Fagan-Jeffries, Dr Steven Cooper and Dr Andrew Austin. Additionally, the team from University of Adelaide and the South Australian Museum point out that the species name translates to ‘strange form’ in Greek, which perfectly suits the characteristic remarkably long ovipositor of the new wasp.

The species Eadya daenerys

In another paper of the same journal issue, Dr. Jean-Luc Boevé, Royal Belgian Institute of Natural Sciences, Diego Domínguez, Universidad Técnica Particular de Loja, Ecuador, and Dr David Smith, Smithsonian’s National Museum of Natural History, USA, publish an illustrated list of the wasp-related sawflies, which they collected from northern Ecuador a few years ago. They also provide a checklist of the country’s species.

In conclusion, the fifth paper, authored by Serbian scientists Dr Milana Mitrovic Institute for Plant Protection and Environment, and Prof Zeljko

The species Dolichogenidea xenomorph

Tomanovic, University of Belgrade, studies ways to extract DNA from dry parasitoid wasps from the natural history archives decades after their preservation. In their work, they make it clear that such projects are of great importance for future taxonomic and conservation research, as well as agriculture.

***

The open access Journal of Hymenoptera Research is published bimonthly by the scholarly publisher Pensoft on behalf of the International Society of Hymenopterists.

***

Original sources:

Boeve; J, Dominguez D, Smith D (2018) Sawflies from northern Ecuador and a checklist for the country (Hymenoptera: Argidae, Orussidae, Pergidae, Tenthredinidae, Xiphydriidae). Journal of Hymenoptera Research 64: 1-24. https://doi.org/10.3897/jhr.64.24408

Ridenbaugh RD, Barbeau E, Sharanowski BJ (2018) Description of four new species of Eadya (Hymenoptera, Braconidae), parasitoids of the Eucalyptus Tortoise Beetle (Paropsis charybdis) and other Eucalyptus defoliating leaf beetles. Journal of Hymenoptera Research 64: 141-175. https://doi.org/10.3897/jhr.64.24282

Fagan-Jeffries EP, Cooper SJB, Austin AD (2018) Three new species of Dolichogenidea Viereck (Hymenoptera, Braconidae, Microgastrinae) from Australia with exceptionally long ovipositors. Journal of Hymenoptera Research 64: 177-190. https://doi.org/10.3897/jhr.64.25219

Boeve; J, Dominguez D, Smith D (2018) Sawflies from northern Ecuador and a checklist for the country (Hymenoptera: Argidae, Orussidae, Pergidae, Tenthredinidae, Xiphydriidae). Journal of Hymenoptera Research 64: 1-24. https://doi.org/10.3897/jhr.64.24408

Mitrovic M, Tomanovic Z (2018) New internal primers targeting short fragments of the mitochondrial COI region for archival specimens from the subfamily Aphidiinae (Hymenoptera, Braconidae). Journal of Hymenoptera Research 64: 191-210. https://doi.org/10.3897/jhr.64.25399

Museum collection reveals distribution of Carolina parakeet 100 years after its extinction

While 2018 marks the centenary of the death of the last captive Carolina parakeet – North America’s only native parrot, a team of researchers have shed new light on the previously known geographical range of the species, which was officially declared extinct in 1920.

Combining observations and specimen data, the new Carolina parakeet occurrence dataset, recently published in the open access Biodiversity Data Journal by Dr Kevin Burgio, , Dr Colin Carlson, University of Maryland and Georgetown University, and Dr Alexander Bond, Natural History Museum of London, is the most comprehensive ever produced.

The new study provides unprecedented information on the birds range providing a window into the past ecology of a lost species.

“Making these data freely available to other researchers will hopefully help unlock the mysteries surrounding the extinction and ecology of this iconic species. Parrots are the most at-risk group of birds and anything we can learn about past extinctions may be useful going forward,” says the study’s lead author, Kevin Burgio.

The observational recordings included in the study have been gleaned from a wide variety of sources, including the correspondence of well-known historical figures such as Thomas Jefferson and the explorers Lewis and Clark.

The study team referenced recorded sightings spanning nearly 400 years. The oldest recorded sighting dates back to 1564, and was found in a description of the current state of Florida written by Rene Laudonniere in 1602.

Alongside the written accounts, the researchers included location data from museum specimens. These include 25 bird skins from the Natural History Museum’s Tring site, whose skin collection is the second largest of its kind in the world, with almost 750,000 specimens representing about 95 per cent of the world’s bird species. Thereby, the study proves what invaluable resources museum collections can be.

“The unique combination of historical research and museum specimens is the only way we can learn about the range of this now-extinct species. Museums are archives of the natural world and research collections like that of the Natural History Museum are incredibly important in helping to increase our understanding of biodiversity conservation and extinction,” says Alex Bond.

“By digitising museum collections, we can unlock the potential of millions of specimens, helping us to answer some of today’s big questions in biodiversity science and conservation.”

It is hoped that this research will be the beginning of a wider reaching work that will explore further into the ecology of this long lost species.

###

Original source:

Burgio KR, Carlson CJ, Bond AL (2018) Georeferenced sighting and specimen occurrence data of the extinct Carolina Parakeet (Conuropsis carolinensis) from 1564 – 1944. Biodiversity Data Journal 6: e25280. https://doi.org/10.3897/BDJ.6.e25280

Artificial neural networks could power up curation of natural history collections

Deep learning techniques manage to differentiate between similar plant families with up to 99 percent accuracy, Smithsonian researchers reveal

Millions, if not billions, of specimens reside in the world’s natural history collections, but most of these have not been carefully studied, or even looked at, in decades. While containing critical data for many scientific endeavors, most objects are quietly sitting in their own little cabinets of curiosity.

Thus, mass digitization of natural history collections has become a major goal at museums around the world. Having brought together numerous biologists, curators, volunteers and citizens scientists, such initiatives have already generated large datasets from these collections and provided unprecedented insight.

Now, a study, recently published in the open access Biodiversity Data Journal, suggests that the latest advances in both digitization and machine learning might together be able to assist museum curators in their efforts to care for and learn from this incredible global resource.

A team of researchers from the Smithsonian Department of BotanyData Science Lab, and Digitization Program Office recently collaborated with NVIDIA to carry out a pilot project using deep learning approaches to dig into digitized herbarium specimens.

Smithsonian researchers classifying digitized herbarium sheets.
Smithsonian researchers classifying digitized herbarium sheets.

Their study is among the first to describe the use of deep learning methods to enhance our understanding of digitized collection samples. It is also the first to demonstrate that a deep convolutional neural network–a computing system modelled after the neuron activity in animal brains that can basically learn on its own–can effectively differentiate between similar plants with an amazing accuracy of nearly 100%.

In the paper, the scientists describe two different neural networks that they trained to perform tasks on the digitized portion (currently 1.2 million specimens) of the United States National Herbarium.

The team first trained a net to automatically recognize herbarium sheets that had been stained with mercury crystals, since mercury was commonly used by some early collectors to protect the plant collections from insect damage. The second net was trained to discriminate between two families of plants that share a strikingly similar superficial appearance.

Sample herbarium specimen image of stained clubmoss
Sample herbarium specimen image of stained clubmoss.

The trained neural nets performed with 90% and 96% accuracy respectively (or 94% and 99% if the most challenging specimens were discarded), confirming that deep learning is a useful and important technology for the future analysis of digitized museum collections.

“The results can be leveraged both to improve curation and unlock new avenues of research,” conclude the scientists.

“This research paper is a wonderful proof of concept. We now know that we can apply machine learning to digitized natural history specimens to solve curatorial and identification problems. The future will be using these tools combined with large shared data sets to test fundamental hypotheses about the evolution and distribution of plants and animals,” says Dr. Laurence J. Dorr, Chair of the Smithsonian Department of Botany.

 

###

Original source:

Schuettpelz E, Frandsen P, Dikow R, Brown A, Orli S, Peters M, Metallo A, Funk V, Dorr L (2017) Applications of deep convolutional neural networks to digitized natural history collections. Biodiversity Data Journal 5: e21139. https://doi.org/10.3897/BDJ.5.e21139

The Western Ghats of India revealed two new primitive species of earthworm

The Western Ghats mountains lie at the southwestern continental margin of Peninsular India and extend all the way from Gujarat to Kerala. The massif has earned its place amongst the eight ‘hottest’ biodiversity hotspots in the world.

There is a great variety of vegetation types which, coupled with the high rainfall and the moderate yearly temperature in the Western Ghats, provide many different habitats. Therefore, the mountains an area rich in earthworm, as well as amphibian and reptile diversity.

The two new species, named Drawida polydiverticulata and Drawida thomasi, have been discovered in the Western Ghats mountain ranges in Kerala by scientists Dr. S. Prasanth Narayanan, Mr. S. Sathrumithra, Dr. G. Christopher, all affiliated with Mahatma Gandhi University and Dr. J.M. Julka of the Shoolini University, India. They belong to the primitive family Moniligastridae. The species are described in the open access journal ZooKeys.

The new earthworms are distinguished by a set of characters. For one of them – Drawida polydiverticulata – there were peculiar features which determined its species name (polydiverticulata). It turned out that its multiple lobes, also called diverticulums, an organ located in the front of its body, are unique amongst the members of the genus. This species was found to be widespread in the protected shola grasslands of the Munnar region, including Eravikulam National ParkPampadun Shola National Park and Chinnar Wildlife Sanctuary.

The second new earthworm, Drawida thomasi, was collected at the Kozhippara Waterfalls near Kakkadampoyil, at the border between Malappuram and Kozhikode. The species name (thomasi) is a tribute to Prof. (Dr.) A.P. Thomas, the Director of the Advanced Centre of Environmental Studies and Sustainable Development (ACESSD), Mahatma Gandhi University, “who initiated the taxonomical studies on the earthworms in Kerala after being at a standstill for almost a century.”

In addition to the new species, the scientists also report the occurrence of five species of the same genus that have not previously been recorded from the state.

To date, there are 73 species of the genus Drawida confirmed to be living in the Indian subcontinent. However, the greatest concentration (43 species) is found in the Western Ghats. The genus has an important centre of speciation in the southernmost state of Kerala.

Prior to this study, there had been sixteen Drawida species known from the state with ten of them being unique. The present discovery of two new species and five new local records further contributes to the vast species richness of the genus in the state.

At present, there are about 200 species known in the genus Drawida. Their habitats are spread across India throughout the Indochina region to southeastern Asia and up to the north in Japan.

###

Original source:

Narayanan SP, Sathrumithra S, Christopher G, Julka JM (2017) New species and new records of earthworms of the genus Drawida from Kerala part of the Western Ghats biodiversity hotspot, India (Oligochaeta, Moniligastridae). ZooKeys 691: 1-18. https://doi.org/10.3897/zookeys.691.13174

35 years of work: More than 1000 leaf-mining pygmy moths classified & catalogued

The leaf-mining pygmy moths (family Nepticulidae) and the white eyecap moths (family Opostegidae) are among the smallest moths in the world with a wingspan of just a few millimetres. Their caterpillars make characteristic patterns in leaves: leaf mines. For the first time, the evolutionary relationships of the more than 1000 species have been analysed on the basis of DNA, resulting in a new classification.

Today, a team of scientists, led by Dr Erik J. van Nieukerken and Dr. Camiel Doorenweerd, Naturalis Biodiversity Center, Leiden, The Netherlands, published three inter-linked scientific publications in the journal Systematic Entomology and the open access journal ZooKeys, together with two online databases, providing a catalogue with the names of all species involved.image-2

The evolutionary study, forming part of the PhD thesis of Doorenweerd, used DNA methods to show that the group is ancient and was already diverse in the early Cretaceous, ca. 100 million years ago, partly based on the occurrence of leaf mines in fossil leaves. The moths are all specialised on some species of flowering plants, also called angiosperms, and could therefore diversify when the angiosperms diversified and largely replaced ecologically other groups of plants in the Cretaceous. The study lead to the discovery of three new genera occurring in South and Central America, which are described in one of the two ZooKeys papers, stressing the peculiar character and vastly undescribed diversity of the Neotropic fauna.

Changing a classification requires a change in many species names, which prompted the authors to simultaneously publish a full catalogue of all 1072 valid species names that are known worldwide and the many synonymic names from the literature from the past 150 years.

Creating such a large and comprehensive overview became possible from the moths and leaf-mine collections of the world’s natural history museums, and culminates the past 35 years of research that van Nieukerken has spent on this group. However, a small, but not trivial, note in one of the publications indicates that we can expect at least another 1000 species of pygmy leafminer moths that are yet undiscovered.image-3

###

Original sources:

Doorenweerd C, Nieukerken EJ van, Hoare RJB (2016) Phylogeny, classification and divergence times of pygmy leafmining moths (Lepidoptera: Nepticulidae): the earliest lepidopteran radiation on Angiosperms? Systematic Entomology, Early View. doi: 10.1111/syen.1221.

Nieukerken EJ van, Doorenweerd C, Nishida K, Snyers C (2016) New taxa, including three new genera show uniqueness of Neotropical Nepticulidae (Lepidoptera). ZooKeys 628: 1-63. doi: 10.3897/zookeys.628.9805.

Nieukerken EJ van, Doorenweerd C, Hoare RJB, Davis DR (2016) Revised classification and catalogue of global Nepticulidae and Opostegidae (Lepidoptera: Nepticuloidea). ZooKeys 628: 65-246. doi: 10.3897/zookeys.628.9799.

Nieukerken EJ van (ed) (2016) Nepticulidae and Opostegidae of the world, version 2.0. Scratchpads, biodiversity online.

Nieukerken EJ van (ed) (2016). Nepticuloidea: Nepticulidae and Opostegidae of the World (Oct 2016 version). In: Species 2000 & ITIS Catalogue of Life, 31st October 2016 (Roskov Y., Abucay L., Orrell T., Nicolson D., Flann C., Bailly N., Kirk P., Bourgoin T., DeWalt R.E., Decock W., De Wever A., eds). Digital resource at http://www.catalogueoflife.org/col. Species 2000: Naturalis, Leiden, the Netherlands. ISSN 2405-8858. http://www.catalogueoflife.org/col/details/database/id/172