A new dawn for biological collections: The AI revolution in museums and herbaria

There are numerous uses for machine learning in digital collections, including an enormous potential to extract traits of organisms.

Guest blog post by Quentin Groom

Imagine having access to all the two billion biological collections of the world from your desktop! Not only to browse, but to search with artificial intelligence. We recently published a paper where we envisage what might be possible, such as searching all specimen labels for a person’s signature, studying the patterns of butterflies’ wings, or reconstructing a historic expedition.

Numbers of digital images from biodiversity collections are increasing exponentially. Herbariums have led the way with tens of millions of images available, but images of pinned insects will soon overtake plants.

Numbers of accessible images of specimens are increasing exponentially. Plants lead the way, but insects are increasing at the fastest rate. This graph was created from snapshots of the Global Biodiversity Information Facility and is undoubtedly an underestimate of the actual number of specimens for which images exist. See how this was created in Groom et al. (2023).

At one time, if you wanted access to biological collections, you had to travel. Now we are used to visiting collections online, where we can view images of specimens and their details on our desktops. Nevertheless, biological collection images are still dispersed and this limits their effective use, not just for people, but also for computers. One of the promises of making specimens digital is being able to apply machine learning to these images.  Yet the real benefits of machine access to specimens can only be realised through massive access to collection images and the ability to apply these techniques to hundreds of collections and millions of specimens.

Imagine examining collections globally for the variation and evolution of wing coloration in butterflies, or studying the size and shape of leaves in research that transverses habitats and gradients of latitude and altitude.

In our paper in Biodiversity Data Journal, we examined some of the numerous uses for machine learning in digital collections. These include an enormous potential to extract traits of organisms, from the size and shape of different organs, to their colours, patterns, and phenology. Imagine examining collections globally for the variation and evolution of wing coloration in butterflies, or studying the size and shape of leaves in research that transverses habitats and gradients of latitude and altitude. We would not only be able to study the intricacies of evolution, but also practical subjects, such as the mechanics of pollination in insects, adaptations to drought in plants, and adaptations to weediness in invasive species.

Machine access to these images will also provide an unparalleled view of the history of the biological sciences, the specimens used to describe species, the evidence for evolution, the people involved and institutions that contributed. Such transparency may reveal some amazing stories of scientific exploration, but will undoubtedly also shed light on some of the less exemplary actions of colonialism. Yet if we are to redress the injustices of the past we need to have a balanced view of collections, and we should do this openly.

Specimen labels provide numerous clues to their history often in the form of stamps and emblems. A BR0000013433048 Meise Botanic Garden (CC-BY-SA 4.0). B USCH0030719, A.C. Moore Herbarium at the University of South Carolina (public domain). C E00809288, Royal Botanic Garden Edinburgh (public domain). D USCH0030719, University of South Carolina (public domain). E E00919066, Royal Botanic Garden Edinburgh (public domain). F BR0000017682725, Meise Botanic Garden (CC-BY-SA 4.0). G P00605317, Museum National d’Histoire Naturelle, Paris (CC-BY 4.0). H LISC036829, Instituto de Investigação Científica Tropical (CC-BY-NC 4.0). l PC0702930, Muséum National d’Histoire Naturelle, Paris (CC-By 4.0). J same specimen as (B). K PC0702930 Muséum National d’Histoire Naturelle, Paris (CC-BY 4.0). L 101178648, Missouri Botanical Garden (CC-BY-SA 4.0).

With such unparalleled access to collections, we could travel vicariously to times and places that are hard to reach in any other way. Fieldwork is expensive and time-consuming, and can’t provide the historic perspective of collections, let alone the geographic extent. Furthermore, digital resources have the potential to democratise collections, allowing anyone the opportunity to study these collections irrespective of location.

Is such a vision of integrated digital collections possible? It certainly is! The technologies already exist, not just for machine learning, but also to create the infrastructure to provide access to millions of digital images and their metadata. Initiatives, such as DiSSCo in Europe and iDigBio in the USA are moving in this direction. Yet, we conclude that the main challenge to realising this vision of the future is a sociopolitical one. Can so many institutions and funders work together to pool their resources? Can collections in rich countries share the sovereignty of their collections with the countries where many of the specimens originated?

If you too share the dream, we encourage you to support or contribute to initiatives working in this direction, whether through funding, collaboration, or sharing knowledge. If the full potential of digital collections is to be realised, we need to think big and work together.

Research article:

Groom Q, Dillen M, Addink W, Ariño AHH, Bölling C, Bonnet P, Cecchi L, Ellwood ER, Figueira R, Gagnier P-Y, Grace OM, Güntsch A, Hardy H, Huybrechts P, Hyam R, Joly AAJ, Kommineni VK, Larridon I, Livermore L, Lopes RJ, Meeus S, Miller JA, Milleville K, Panda R, Pignal M, Poelen J, Ristevski B, Robertson T, Rufino AC, Santos J, Schermer M, Scott B, Seltmann KC, Teixeira H, Trekels M, Gaikwad J (2023) Envisaging a global infrastructure to exploit the potential of digitised collections. Biodiversity Data Journal 11: e109439. https://doi.org/10.3897/BDJ.11.e109439

New way to browse interlinked biodiversity data: Biodiversity Knowledge Hub NOW ONLINE!

The Biodiversity Knowledge Hub is a one-stop portal that allows users to access FAIR and interlinked biodiversity data and services in a few clicks.

The Horizon 2020 BiCIKL Project is proud to announce that the Biodiversity Knowledge Hub (BKH) is now online.

BKH is a one-stop portal that allows users to access FAIR and interlinked biodiversity data and services in a few clicks. BKH was designed to support a new emerging community of users over time and across the entire biodiversity research cycle providing its services to anybody, anywhere and anytime.

The Knowledge Hub is the main product from our BiCIKL consortium, and we are delighted with the result!

BKH can easily be seen as the beginning of the major shift in the way we search interlinked biodiversity information.”

Biodiversity researchers, research infrastructures and publishers interested in fields ranging from taxonomy to ecology and bioinformatics can now freely use BKH as a compass to navigate the oceans of biodiversity data. BKH will do the linkages.

says Prof. Lyubomir Penev, BiCIKL’s Project coordinator and Founder of Pensoft Publishers
The BKH is designed to serve a new emerging community of users over time and across the entire biodiversity research cycle. 

We have invested our best energies and resources in the development of BKH and the Fair Data Place (FDP), which is the beating heart of the portal,”

BKH has been designed to support a new emerging community of users across the entire biodiversity research cycle.

Its purpose goes beyond the BiCIKL project itself: we are thrilled to say that BKH is meant to stay, aiming to reshape the way biodiversity knowledge is accessed and used.

says Dr Christos Arvanitidis, CEO of LifeWatch ERIC.

The BKH outlines how users can navigate and access the linked data, tools and services of the infrastructures cooperating in BiCIKL.

By revealing how they harvest, liberate and reuse data, these increasingly integrated sources enable researchers in the natural sciences to move more seamlessly between specimens and material samples, genomic and metagenomic data, scientific literature, and taxonomic names and units.

said Dr Joe Miller, Executive Secretary of GBIF—the Global Biodiversity Information Facility.

A training programme on how to best utilise the platform is currently being developed by the Consortium of European Taxonomic Facilities (CETAF), Pensoft PublishersPlaziMeise Botanic GardenEMBL’s European Bioinformatics Institute (EMBL-EBI), ELIXIR HubGBIF – the Global Biodiversity Information Facility, and LifeWatch ERIC and will be finalised in the coming months.

***

A detailed description of the BKH tools and services provided by its contributing organisations is available at: https://biodiversityknowledgehub.eu.

***

Find more information about the BiCIKL consortium partners on the project’s website.

***

Follow BiCIKL Project on Twitter and Facebook. Join the conversation on Twitter at #BiCIKL_H2020.