Mass digitisation of a herbarium collection: ten lessons learned from Meise Botanic Garden

The lessons were published in the open-access journal PhytoKeys.

Herbaria – collections of preserved plant specimens – are crucial in botanical research and biodiversity conservation. Digitising these collections is an important step towards making data available to all, preserving specimens by reducing the need for handling, and creating new research opportunities.

Herbarium specimens on a conveyor belt at Meise Botanic Garden.
Mass digitisation of herbarium specimens on a conveyor belt at Meise Botanic Garden, allowing the imaging of 3,000–5,000 specimens per day.

Meise Botanic Garden recently completed a six-year project to digitise approximately three million specimens of their herbarium collection. While it was a big change for their organisation, it was one they deemed necessary to bring their collection into the digital age. 

The digitisation project contributes to the Distributed System of Scientific Collections (DiSSCo) research infrastructure aiming to unify access to biodiversity and geodiversity specimens under common standards, giving users access to specimens and their data from European institutions. DiSSCo has also created a website with digitisation guides and the DiSSCo Knowledge Base

Several people sitting at tables working on herbarium specimens.
Joint restoration session of the herbarium team at Meise Botanic Garden.

Based on their experience, the team published ten valuable lessons they learned during the process to assist  other institutions embarking on similar digitisation projects. These lessons are available in the open-access journal PhytoKeys.

1. Knowing yourself is the beginning of all wisdom ― Aristotle

Before starting digitisation, it is important to understand the full scope of your collection. This involves detailed inventory checks and assessments of the state of the specimens. Knowing the exact number and condition of the specimens will help in accurate budgeting and planning. A detailed inventory of a representative tenth of your collection can be extrapolated to the entire collection.

2. Prioritise (if lack of money forces you to do so)

If resources are limited, prioritising which parts of the collection to digitise first is key. Consider factors such as the scientific importance of the specimens, their physical state, and stakeholder needs. It is important to note that digitising the entire collection can be more efficient than selecting subcollections, as partial digitisation can complicate management.

3. Learn from other people’s successes – and mistakes

Do not reinvent the wheel. Engage with other institutions that have undertaken similar projects to learn from their successes and mistakes. Follow existing guidelines and adapt them to fit your specific needs. If you think you have a better way of doing things, talk it over with someone with experience. 

4. Decide whether to do it yourself or have it done for you

Deciding whether to conduct the digitisation in-house or to outsource it depends on available resources. Consider the skills and availability of your staff and the costs associated with outsourcing. Some tasks, such as imaging or data transcription, might be more efficiently handled by external specialists.

5. Make a plan

A well-thought-out plan is crucial. Define workflows, procedures, and quality control mechanisms. And be specific about your requirements when outsourcing parts of the project to avoid any misunderstandings.

6. Go shopping

Ensure that all necessary supplies, such as barcodes, storage containers, and IT infrastructure, are in place before starting the digitisation process. Bulk purchasing is often cost-effective, and having everything ready will prevent delays.

7. Make your collection look its best for the photographer

Prepare the specimens for imaging by incorporating pre-digitisation curation steps like repairing damaged specimens and adding barcodes. 

8. Expect problems, particularly ones that you don’t expect

Problems will arise, from equipment malfunctions to human errors. Establish quality control processes to catch issues early. Automate checks where possible and ensure prompt human review for aspects like image focus and lighting.

9. Make your data visible – make a big deal of it

Making digitised data publicly accessible is vital. Use online portals and ensure the data adheres to FAIR principles (Findable, Accessible, Interoperable, Reusable). Publicity will increase the use and impact of your collection.

10. Save your data for the future

Make sure the digitised data is backed up in a secure, offsite archive. Long-term storage solutions should be considered to preserve the data for future use. And factor this ongoing cost into the budget.

To read extended advice from Meise Botanic Garden, as well as four case studies, check out the full research paper below:

Original source

De Smedt S, Bogaerts A, De Meeter N, Dillen M, Engledow H, Van Wambeke P, Leliaert F, Groom Q (2024) Ten lessons learned from the mass digitisation of a herbarium collection. PhytoKeys 244: 23-37. https://doi.org/10.3897/phytokeys.244.120112 

***

Follow PhytoKeys on Facebook and X.

Forgotten tropical plants rediscovered after 100+ years with the help of community science

Through the collaborative efforts of botanists and citizen scientists, these plants have been rediscovered after decades, some even after more than a century.

Deep in the tropical Andes are hiding plants that were discovered and then forgotten; plants that we knew almost nothing about. Now, thanks to the combined efforts of botanists from Germany, Ecuador, Peru and Costa Rica and amateur plant enthusiasts, these plants have been rediscovered, some of them after more than 100 years. The findings were described in the open-access journal PhytoKeys.

Nasa hastata. Photo by P. Gonzáles

The plants belong to Nasa, a genus from the Blazing Star family (Loasaceae) that has long caused headaches to scientists as its delicate but painfully urticant leaves make it difficult to collect. Most of them are rare, highly endemic, and only around for short periods, which makes them even more unlikely to end up in a herbarium collection.

Luckily, today’s scientists don’t have to rely on herbaria as their sole source of material and clues. Thanks to the advent of global networking and the increasing use of free data repositories, there is a lot more biodiversity data now that is available to use and easily accessible, for example as geo-referenced occurrence records and photos. Citizen science platform iNaturalist, where users can, among others, post photographic occurrence records, has turned into a valuable tool for biodiversity scientists, and plays a significant role in the rediscovery of these Andean plants.

One notable species, Nasa colanii, had only been recorded once, in 1978, until the research team came upon a photograph from 2019. This scarcity in records might have to do with the fact that the plant grows in a highly inaccessible region: in a cloud forest in the buffer zone of Peru’s Cordillera de Colán National Sanctuary, at an elevation of 2605 m.

A flowering branch of Nasa colanii. Photo by A. A. Wong Sato

Another species hadn’t been reported for approximately 130 years when iNaturalist users confirmed its existence in 2022 by uploading photographs. Nasa ferox had been known for centuries, but it didn’t get its scientific description until 2000. “Given the location of the park close to the [Ecuadorian] city of Cuenca, and the fact that the important road 582 goes through the park makes it particularly surprising that the species has not been reported in such a long time, even more so if we consider the numerous botanical expeditions that have been carried out in the general region,” the researchers write in their paper. In fact, only a small population of about ten fertile plants of N. ferox has been found, with the plants always growing in sheltered places such as in rock crevices or at the base of shrubs.

Remarkably, the typical form of Nasa humboldtiana called Nasa humboldtiana subspecies humboldtiana was rediscovered after 162 years, when the research team found a specimen in a conserved remnant of montane Andean forest in the province of Chimborazo, Ecuador.

Flower of Nasa humboldtiana subspecies humboldtiana. Photo by X. Cornejo

But probably the most exciting discoveries happened when the team found species that have been considered extinct in the wild. Two species of Nasa, namely N. hastata and N. solaria, were believed to share this fate, both from the Peruvian Department of Lima, a comparably well sampled area, given the proximity to the national capital. Until very recently, both species “remained unknown (or almost so) in the wild.” Earlier attempts to recollect these species near their type localities where they have been found some 100 years ago failed and it needed the help of iNaturalist to reveal that they are still present in the area.  

Nasa solaria. Photo by P. Gonzáles

Nasa hastata was recently rediscovered, after, for the first time, photos of living plants showed up taken by the sister of one of the authors. Only a handful of plants have since been reported from two sites, some 7 km apart. Similarly, a few dozens of plants have been found so far from N. solaria occurring in four small relict populations in remnants of forest that once covered larger areas in this region.  

Flower of Nasa hastata. Photo by P. Gonzáles

Observations uploaded to iNaturalist also revealed important information on another species, Nasa ramirezii,providing the first photographs of living plants from Ecuador and the first data on its exact location.

“All these discoveries serve as a reminder that even well-studied regions harbor diversity that can so easily remain overlooked and unexplored, and point to the role of botanists in documenting biodiversity which is an essential prerequisite for any conservation effort.” leading author Tilo Henning from the Leibniz Center for Agricultural Landscape Research (ZALF) says.

“Hopefully, as more scientists and members of the public contribute to the database, and more professionals get involved in the curation, more undescribed or ‘long lost’ taxa will be found. Our examples of the rediscovery of Nasa ferox after 130 years and Nasa hastata after 100 years, both ‘found’ on iNaturalist underscore this point,” the researchers say in their study.

Original source:

Henning T, Acuña-Castillo R, Cornejo X, Gonzáles P, Segovia E, Wong Sato AA, Weigend M (2023) When the absence of evidence is not the evidence of absence: Nasa (Loasaceae) rediscoveries from Peru and Ecuador, and the contribution of community science networks. PhytoKeys 229: 1-19. https://doi.org/10.3897/phytokeys.229.100082

Artificial neural networks could power up curation of natural history collections

Deep learning techniques manage to differentiate between similar plant families with up to 99 percent accuracy, Smithsonian researchers reveal

Millions, if not billions, of specimens reside in the world’s natural history collections, but most of these have not been carefully studied, or even looked at, in decades. While containing critical data for many scientific endeavors, most objects are quietly sitting in their own little cabinets of curiosity.

Thus, mass digitization of natural history collections has become a major goal at museums around the world. Having brought together numerous biologists, curators, volunteers and citizens scientists, such initiatives have already generated large datasets from these collections and provided unprecedented insight.

Now, a study, recently published in the open access Biodiversity Data Journal, suggests that the latest advances in both digitization and machine learning might together be able to assist museum curators in their efforts to care for and learn from this incredible global resource.

A team of researchers from the Smithsonian Department of BotanyData Science Lab, and Digitization Program Office recently collaborated with NVIDIA to carry out a pilot project using deep learning approaches to dig into digitized herbarium specimens.

Smithsonian researchers classifying digitized herbarium sheets.
Smithsonian researchers classifying digitized herbarium sheets.

Their study is among the first to describe the use of deep learning methods to enhance our understanding of digitized collection samples. It is also the first to demonstrate that a deep convolutional neural network–a computing system modelled after the neuron activity in animal brains that can basically learn on its own–can effectively differentiate between similar plants with an amazing accuracy of nearly 100%.

In the paper, the scientists describe two different neural networks that they trained to perform tasks on the digitized portion (currently 1.2 million specimens) of the United States National Herbarium.

The team first trained a net to automatically recognize herbarium sheets that had been stained with mercury crystals, since mercury was commonly used by some early collectors to protect the plant collections from insect damage. The second net was trained to discriminate between two families of plants that share a strikingly similar superficial appearance.

Sample herbarium specimen image of stained clubmoss
Sample herbarium specimen image of stained clubmoss.

The trained neural nets performed with 90% and 96% accuracy respectively (or 94% and 99% if the most challenging specimens were discarded), confirming that deep learning is a useful and important technology for the future analysis of digitized museum collections.

“The results can be leveraged both to improve curation and unlock new avenues of research,” conclude the scientists.

“This research paper is a wonderful proof of concept. We now know that we can apply machine learning to digitized natural history specimens to solve curatorial and identification problems. The future will be using these tools combined with large shared data sets to test fundamental hypotheses about the evolution and distribution of plants and animals,” says Dr. Laurence J. Dorr, Chair of the Smithsonian Department of Botany.

 

###

Original source:

Schuettpelz E, Frandsen P, Dikow R, Brown A, Orli S, Peters M, Metallo A, Funk V, Dorr L (2017) Applications of deep convolutional neural networks to digitized natural history collections. Biodiversity Data Journal 5: e21139. https://doi.org/10.3897/BDJ.5.e21139