A new dawn for biological collections: The AI revolution in museums and herbaria

There are numerous uses for machine learning in digital collections, including an enormous potential to extract traits of organisms.

Guest blog post by Quentin Groom

Imagine having access to all the two billion biological collections of the world from your desktop! Not only to browse, but to search with artificial intelligence. We recently published a paper where we envisage what might be possible, such as searching all specimen labels for a person’s signature, studying the patterns of butterflies’ wings, or reconstructing a historic expedition.

Numbers of digital images from biodiversity collections are increasing exponentially. Herbariums have led the way with tens of millions of images available, but images of pinned insects will soon overtake plants.

Numbers of accessible images of specimens are increasing exponentially. Plants lead the way, but insects are increasing at the fastest rate. This graph was created from snapshots of the Global Biodiversity Information Facility and is undoubtedly an underestimate of the actual number of specimens for which images exist. See how this was created in Groom et al. (2023).

At one time, if you wanted access to biological collections, you had to travel. Now we are used to visiting collections online, where we can view images of specimens and their details on our desktops. Nevertheless, biological collection images are still dispersed and this limits their effective use, not just for people, but also for computers. One of the promises of making specimens digital is being able to apply machine learning to these images.  Yet the real benefits of machine access to specimens can only be realised through massive access to collection images and the ability to apply these techniques to hundreds of collections and millions of specimens.

Imagine examining collections globally for the variation and evolution of wing coloration in butterflies, or studying the size and shape of leaves in research that transverses habitats and gradients of latitude and altitude.

In our paper in Biodiversity Data Journal, we examined some of the numerous uses for machine learning in digital collections. These include an enormous potential to extract traits of organisms, from the size and shape of different organs, to their colours, patterns, and phenology. Imagine examining collections globally for the variation and evolution of wing coloration in butterflies, or studying the size and shape of leaves in research that transverses habitats and gradients of latitude and altitude. We would not only be able to study the intricacies of evolution, but also practical subjects, such as the mechanics of pollination in insects, adaptations to drought in plants, and adaptations to weediness in invasive species.

Machine access to these images will also provide an unparalleled view of the history of the biological sciences, the specimens used to describe species, the evidence for evolution, the people involved and institutions that contributed. Such transparency may reveal some amazing stories of scientific exploration, but will undoubtedly also shed light on some of the less exemplary actions of colonialism. Yet if we are to redress the injustices of the past we need to have a balanced view of collections, and we should do this openly.

Specimen labels provide numerous clues to their history often in the form of stamps and emblems. A BR0000013433048 Meise Botanic Garden (CC-BY-SA 4.0). B USCH0030719, A.C. Moore Herbarium at the University of South Carolina (public domain). C E00809288, Royal Botanic Garden Edinburgh (public domain). D USCH0030719, University of South Carolina (public domain). E E00919066, Royal Botanic Garden Edinburgh (public domain). F BR0000017682725, Meise Botanic Garden (CC-BY-SA 4.0). G P00605317, Museum National d’Histoire Naturelle, Paris (CC-BY 4.0). H LISC036829, Instituto de Investigação Científica Tropical (CC-BY-NC 4.0). l PC0702930, Muséum National d’Histoire Naturelle, Paris (CC-By 4.0). J same specimen as (B). K PC0702930 Muséum National d’Histoire Naturelle, Paris (CC-BY 4.0). L 101178648, Missouri Botanical Garden (CC-BY-SA 4.0).

With such unparalleled access to collections, we could travel vicariously to times and places that are hard to reach in any other way. Fieldwork is expensive and time-consuming, and can’t provide the historic perspective of collections, let alone the geographic extent. Furthermore, digital resources have the potential to democratise collections, allowing anyone the opportunity to study these collections irrespective of location.

Is such a vision of integrated digital collections possible? It certainly is! The technologies already exist, not just for machine learning, but also to create the infrastructure to provide access to millions of digital images and their metadata. Initiatives, such as DiSSCo in Europe and iDigBio in the USA are moving in this direction. Yet, we conclude that the main challenge to realising this vision of the future is a sociopolitical one. Can so many institutions and funders work together to pool their resources? Can collections in rich countries share the sovereignty of their collections with the countries where many of the specimens originated?

If you too share the dream, we encourage you to support or contribute to initiatives working in this direction, whether through funding, collaboration, or sharing knowledge. If the full potential of digital collections is to be realised, we need to think big and work together.

Research article:

Groom Q, Dillen M, Addink W, Ariño AHH, Bölling C, Bonnet P, Cecchi L, Ellwood ER, Figueira R, Gagnier P-Y, Grace OM, Güntsch A, Hardy H, Huybrechts P, Hyam R, Joly AAJ, Kommineni VK, Larridon I, Livermore L, Lopes RJ, Meeus S, Miller JA, Milleville K, Panda R, Pignal M, Poelen J, Ristevski B, Robertson T, Rufino AC, Santos J, Schermer M, Scott B, Seltmann KC, Teixeira H, Trekels M, Gaikwad J (2023) Envisaging a global infrastructure to exploit the potential of digitised collections. Biodiversity Data Journal 11: e109439. https://doi.org/10.3897/BDJ.11.e109439

Leave a Reply