Artificial neural networks could power up curation of natural history collections

Deep learning techniques manage to differentiate between similar plant families with up to 99 percent accuracy, Smithsonian researchers reveal

Millions, if not billions, of specimens reside in the world’s natural history collections, but most of these have not been carefully studied, or even looked at, in decades. While containing critical data for many scientific endeavors, most objects are quietly sitting in their own little cabinets of curiosity.

Thus, mass digitization of natural history collections has become a major goal at museums around the world. Having brought together numerous biologists, curators, volunteers and citizens scientists, such initiatives have already generated large datasets from these collections and provided unprecedented insight.

Now, a study, recently published in the open access Biodiversity Data Journal, suggests that the latest advances in both digitization and machine learning might together be able to assist museum curators in their efforts to care for and learn from this incredible global resource.

A team of researchers from the Smithsonian Department of BotanyData Science Lab, and Digitization Program Office recently collaborated with NVIDIA to carry out a pilot project using deep learning approaches to dig into digitized herbarium specimens.

Smithsonian researchers classifying digitized herbarium sheets.
Smithsonian researchers classifying digitized herbarium sheets.

Their study is among the first to describe the use of deep learning methods to enhance our understanding of digitized collection samples. It is also the first to demonstrate that a deep convolutional neural network–a computing system modelled after the neuron activity in animal brains that can basically learn on its own–can effectively differentiate between similar plants with an amazing accuracy of nearly 100%.

In the paper, the scientists describe two different neural networks that they trained to perform tasks on the digitized portion (currently 1.2 million specimens) of the United States National Herbarium.

The team first trained a net to automatically recognize herbarium sheets that had been stained with mercury crystals, since mercury was commonly used by some early collectors to protect the plant collections from insect damage. The second net was trained to discriminate between two families of plants that share a strikingly similar superficial appearance.

Sample herbarium specimen image of stained clubmoss
Sample herbarium specimen image of stained clubmoss.

The trained neural nets performed with 90% and 96% accuracy respectively (or 94% and 99% if the most challenging specimens were discarded), confirming that deep learning is a useful and important technology for the future analysis of digitized museum collections.

“The results can be leveraged both to improve curation and unlock new avenues of research,” conclude the scientists.

“This research paper is a wonderful proof of concept. We now know that we can apply machine learning to digitized natural history specimens to solve curatorial and identification problems. The future will be using these tools combined with large shared data sets to test fundamental hypotheses about the evolution and distribution of plants and animals,” says Dr. Laurence J. Dorr, Chair of the Smithsonian Department of Botany.



Original source:

Schuettpelz E, Frandsen P, Dikow R, Brown A, Orli S, Peters M, Metallo A, Funk V, Dorr L (2017) Applications of deep convolutional neural networks to digitized natural history collections. Biodiversity Data Journal 5: e21139.

A decade of monitoring shows the dynamics of a conserved Atlantic tropical forest

Characterised with its immense biodiversity and high levels of endemism, the Atlantic Tropical Forest has been facing serious anthropogenic threats over the last several decades, demanding for such activities and their effects to be closely studied and monitored as part of the forest dynamics.

Cattle farming, expanding agricultural land areas and mining have reduced the Atlantic Forest to many small patches of vegetation. As a result, important ecosystem services, such as carbon stock, are steadily diminishing as the biomass decreases.

Brazilian researchers, led by Dr. Écio Souza Diniz, Federal University of Viçosa, spent a decade monitoring a semi-deciduous forest located in an ecological park in Southeast Brazil. Their observations are published in the open access Biodiversity Data Journal.

The team surveyed two stands within the forest to present variations in the structure and diversity of the plants over time, along with their dynamics, including mortality and establishment rates. They based their findings on the most abundant tree species occurring within each stand.

At the forest stands, the most abundant and important species for biomass accumulation are concluded to be trees larger than 20 cm in diameter, which characterise advanced successional stage within the forest.

“It is fundamental that opportunities to monitor conserved sites of the Atlantic Forest are taken, so that studies about their dynamics are conducted in order to better understand how they work,” note the scientists.

“The information from such surveys could improve the knowledge about the dynamics at anthropised and fragmented sites compared with protected areas.”

In order to encourage further research into the composition, diversity and structure of the Atlantic Forest over time and the subsequent contributions to the preservation of this threatened ecosystem, the authors made their data publicly available. The datasets, including species occurrences, are now openly accessible via the Global Biodiversity Information Facility(GBIF) and the biodiversity informatics data standard Darwin Core.


Original source:

Diniz ES, Carvalho W, Santos R, Gastauer M, Garcia P, Fontes M, Coelho P, Moreira A, Menino G, Oliveira-Filho A (2017) Long-term monitoring of diversity and structure of two stands of an Atlantic Tropical Forest. Biodiversity Data Journal 5: e13564.

Effects of soil and drainage on the savanna vegetation in the northern Brazilian Amazonia

It is a well-known fact that environmental factors such as soil texture and drainage determine to a very large degree the vegetation appearance, richness and composition at any site. However, there has been little research on how these variables influence the flora in the marvellous savannas – large open areas characterised by a complex and unique network of natural resources and life forms.

Consequently, a Brazilian research team, led by Dr. Maria Aparecida de Moura Araújo, Universidade Federal de Roraima, investigated the hydro-edaphic conditions in the savanna areas in the northern Brazilian Amazonia. Their study, complete with an openly available and ready for re-use dataset, is published in the open access Biodiversity Data Journal.  

Image 1_Annonaceae_Xylopia aromatica_treeIn the course of the Program for Biodiversity Research, managed by the Brazilian government, the scientists sampled 20 permanent plots in two savanna areas in the state of Roraima, located in the northern of the Brazilian Amazon. As a result, the team reports a total of 128 plant species classified into 34 families from three savanna habitats with different levels of hydro-edaphic restrictions.

Amongst the various factors playing a role in the soil characteristics of the area, are the tectonic events and past climatic fluctuations which have occurred in the most recent period of the Cenozoic era. Paleo, as well as modern fires are likely to be other culprits for the specific conditions.

In conclusion, the authors suggest that the most restrictive savanna habitats – the wet grasslands, represent the home to less structurally complex plants, compared to the well-drained shrubby localities.

“The present study highlights the environmental heterogeneity and the biological importance of Roraima’s savanna regarding the conservation of natural resources from the Amazon,” say the scientists.

Image 2_Convolvulaceae_Merremia aturensis_herb“In addition, it points out the need for greater investment in floristic inventories associated with greater diversification of sites, since this entire ecosystem has been rapidly modified by agribusiness.”

Licensed under a Creative Commons License (CC-BY 4.0) and available in a Darwin Core Archive DwC-A format; the complete dataset is openly available via the Global Biodiversity Information Facility (GBIF).


Original source:
Araújo M, Rocha A, Miranda I, Barbosa R (2017) Hydro-edaphic conditions defining richness and species composition in savanna areas of the northern Brazilian Amazonia. Biodiversity Data Journal 5: e13829.