digitization

New smartphone workflows revolutionize the digitization of natural history collections

By digitizing these data, we can preserve valuable knowledge about our biodiversity, especially in times of climate change and biodiversity crises.

A team from the Leibniz Institute for the Analysis of Biodiversity Change (LIB) has discovered groundbreaking ways for rapidly digitizing collection data. Data of insect specimen labels can now be easily read with just a smartphone – and all wirelessly and using only free, already available apps!

A screenshot of Google Lens interface, displaying translation options and a set of specimen labels on a light surface. — Screenshots from a mobile phone showing the steps of scanning of real-time data collection, and examples of labels: A step 1: marking of the text to be captured via touch screen of the mobile phone (example – printed labels scanned on pin) B step 2: select from menu bar (at the right side under three dots) “Copy to computer” (example – printed labels scanned separately). C Capture of multidirectional printed labels scanned separately from the specimen in “*Google Lens*” D Capture of multiple distorted, printed labels scanned on the pinned specimen in “*Google Lens*” E Initial capture of a printed label scanned separately from the specimen in “*Google Keep*” F Extracted data resulting from E.

A screenshot showing Google Lens and a label in blue text describing a specimen collected in North Iraq, Kurdistan, detailing location and scientific classification. — Screenshots from a mobile phone showing the steps of scanning of real-time data collection, and examples of labels: A step 1: marking of the text to be captured via touch screen of the mobile phone (example – printed labels scanned on pin) B step 2: select from menu bar (at the right side under three dots) “Copy to computer” (example – printed labels scanned separately). C Capture of multidirectional printed labels scanned separately from the specimen in “*Google Lens*” D Capture of multiple distorted, printed labels scanned on the pinned specimen in “*Google Lens*” E Initial capture of a printed label scanned separately from the specimen in “*Google Keep*” F Extracted data resulting from E.

Why is this important?

Around 1.1 billion objects in the largest natural history museums worldwide remain undigitized and manual extraction of specimen label information for taxonomic revisions, another source for biodiversity data mobilization, is very time consuming. By digitizing these data, we can preserve valuable knowledge about our biodiversity, especially in times of climate change and human biodiversity crisis when many species are going extinct before they are even discovered.

This innovation will accelerate and advance global research and the preservation of our biological knowledge. And the best part? It’s not expensive and accessible to everyone – from professionals to amateur scientists!

Research article:

Ahrens D, Haas A, Pacheco TL, Grobe P (2025) Extracting specimen label data rapidly with a smartphone—a great help for simple digitization in taxonomy and collection management. ZooKeys 1233: 15-30. https://doi.org/10.3897/zookeys.1233.140726

Deciphering cyrillics: revealing the myxomycetes of Ukraine from invisible sources

A new study compiles over 150 years of research on Ukraine’s myxomycetes – amoebae that form fascinating fungi-like fruiting bodies.

Guest blog post by Iryna Yatsiuk

A graphic showing the occurrences of myxomycetes on a map of Ukraine. — *Occurrences of myxomycetes in Ukraine from the present study*.

Myxomycetes, or slime molds, despite their unassuming name, are fascinating organisms that play a crucial role in forest ecosystems. They live as single-cell amoebae in soil or all sorts of plant debris, where they feed on microscopic bacteria, algae, and fungi. However, when it is time to reproduce and disseminate, these tiny amoebae fuse with each other and form slimy, mobile structures – plasmodia. Plasmodia slowly but actively crawl on the substrate, and eventually transform into fungi-like fruiting bodies filled with spores. Both plasmodia and fruiting bodies are visible with the naked eye and can be easily found e.g. on decaying wood or on the forest floor.

Yellow slime mold on a tree trunk. — *Plasmodium (left) and fruiting bodies (right) of the same species of a slime mold, with a difference of one day.*

Fruiting bodies of slime mold on a tree trunk. — *Plasmodium (left) and fruiting bodies (right) of the same species of a slime mold, with a difference of one day.*

Myxomycetes are unusual in their life cycle and very eye-catching – if only one knows where to look for them. No wonder that they have attracted the attention of naturalists for centuries. On the territory of Ukraine, observations of myxomycetes first appeared in the first half of 19th century and have been occurring sporadically in the mycological literature ever since.

Slime mold. — *‘Wasp nest’ slime mold – a common and widespread species of myxomycetes in U*kraine.

However, much valuable information about the myxomycetes of Ukraine before our study was in a “grey zone”. This includes undigitized historical books and articles published in languages such as Polish, French, or German. Furthermore, there is a significant body of proceedings of local conferences, articles in local journals, and reports produced by the employees of protected areas. Yet, many of these publications existed only in print and were written in the Cyrillic alphabet, so they remained difficult to discover, to access, or to work with.

A page of Maria Zelle's work “Materials for the myxomycete flora of Ukraine”. — *An example of an “invisible” literature source, a page from Maria Zelle “Materials for the myxomycete flora of Ukraine”, 1925*.

Within this study, published in Biodiversity Data Journal, we aimed to summarize all published research on myxomycetes of Ukraine, which spans over 150 years, and make the data, as well as the literature behind the data, open and easy to use. For this, we collected and mined 91 publications on this topic, spanning the years 1842 to 2023. As the result, we extracted over 5000 occurrences of myxomycetes that belong to 331 species. The produced datasets we published on GBIF, and the major part of the literature sources on the platform Zenodo.org in open access.

*Datasets produced by this study available on GBIF.*

A group of researchers posing for a picture. — *Leaders of the BioData project with future Ukrainian mentors*.

With this initiative, we aimed to open to the wider audience and digitally preserve some part of the biodiversity data heritage of Ukraine that is currently under threat of destruction.

This study was substantially driven by the BioDATA project, which helped a lot in developing biodiversity data management skills in our team.

Research article:

Yatsiuk I, Leshchenko Y, Viunnyk V, Leontyev DV (2024) The comprehensive checklist of myxomycetes of Ukraine, based on extended occurrence and reference datasets. Biodiversity Data Journal 12: e120891. https://doi.org/10.3897/BDJ.12.e120891

A new dawn for biological collections: The AI revolution in museums and herbaria

There are numerous uses for machine learning in digital collections, including an enormous potential to extract traits of organisms.

Guest blog post by Quentin Groom

Imagine having access to all the two billion biological collections of the world from your desktop! Not only to browse, but to search with artificial intelligence. We recently published a paper where we envisage what might be possible, such as searching all specimen labels for a person’s signature, studying the patterns of butterflies’ wings, or reconstructing a historic expedition.

Numbers of digital images from biodiversity collections are increasing exponentially. Herbariums have led the way with tens of millions of images available, but images of pinned insects will soon overtake plants.

Numbers of accessible images of specimens are increasing exponentially. Plants lead the way, but insects are increasing at the fastest rate. This graph was created from snapshots of the Global Biodiversity Information Facility and is undoubtedly an underestimate of the actual number of specimens for which images exist. See how this was created in Groom et al. (2023).

At one time, if you wanted access to biological collections, you had to travel. Now we are used to visiting collections online, where we can view images of specimens and their details on our desktops. Nevertheless, biological collection images are still dispersed and this limits their effective use, not just for people, but also for computers. One of the promises of making specimens digital is being able to apply machine learning to these images. Yet the real benefits of machine access to specimens can only be realised through massive access to collection images and the ability to apply these techniques to hundreds of collections and millions of specimens.

Imagine examining collections globally for the variation and evolution of wing coloration in butterflies, or studying the size and shape of leaves in research that transverses habitats and gradients of latitude and altitude.

In our paper in Biodiversity Data Journal, we examined some of the numerous uses for machine learning in digital collections. These include an enormous potential to extract traits of organisms, from the size and shape of different organs, to their colours, patterns, and phenology. Imagine examining collections globally for the variation and evolution of wing coloration in butterflies, or studying the size and shape of leaves in research that transverses habitats and gradients of latitude and altitude. We would not only be able to study the intricacies of evolution, but also practical subjects, such as the mechanics of pollination in insects, adaptations to drought in plants, and adaptations to weediness in invasive species.

Machine access to these images will also provide an unparalleled view of the history of the biological sciences, the specimens used to describe species, the evidence for evolution, the people involved and institutions that contributed. Such transparency may reveal some amazing stories of scientific exploration, but will undoubtedly also shed light on some of the less exemplary actions of colonialism. Yet if we are to redress the injustices of the past we need to have a balanced view of collections, and we should do this openly.

Specimen labels provide numerous clues to their history often in the form of stamps and emblems. A BR0000013433048 Meise Botanic Garden (CC-BY-SA 4.0). B USCH0030719, A.C. Moore Herbarium at the University of South Carolina (public domain). C E00809288, Royal Botanic Garden Edinburgh (public domain). D USCH0030719, University of South Carolina (public domain). E E00919066, Royal Botanic Garden Edinburgh (public domain). F BR0000017682725, Meise Botanic Garden (CC-BY-SA 4.0). G P00605317, Museum National d’Histoire Naturelle, Paris (CC-BY 4.0). H LISC036829, Instituto de Investigação Científica Tropical (CC-BY-NC 4.0). l PC0702930, Muséum National d’Histoire Naturelle, Paris (CC-By 4.0). J same specimen as (B). K PC0702930 Muséum National d’Histoire Naturelle, Paris (CC-BY 4.0). L 101178648, Missouri Botanical Garden (CC-BY-SA 4.0).

With such unparalleled access to collections, we could travel vicariously to times and places that are hard to reach in any other way. Fieldwork is expensive and time-consuming, and can’t provide the historic perspective of collections, let alone the geographic extent. Furthermore, digital resources have the potential to democratise collections, allowing anyone the opportunity to study these collections irrespective of location.

Is such a vision of integrated digital collections possible? It certainly is! The technologies already exist, not just for machine learning, but also to create the infrastructure to provide access to millions of digital images and their metadata. Initiatives, such as DiSSCo in Europe and iDigBio in the USA are moving in this direction. Yet, we conclude that the main challenge to realising this vision of the future is a sociopolitical one. Can so many institutions and funders work together to pool their resources? Can collections in rich countries share the sovereignty of their collections with the countries where many of the specimens originated?

If you too share the dream, we encourage you to support or contribute to initiatives working in this direction, whether through funding, collaboration, or sharing knowledge. If the full potential of digital collections is to be realised, we need to think big and work together.

Research article:

Groom Q, Dillen M, Addink W, Ariño AHH, Bölling C, Bonnet P, Cecchi L, Ellwood ER, Figueira R, Gagnier P-Y, Grace OM, Güntsch A, Hardy H, Huybrechts P, Hyam R, Joly AAJ, Kommineni VK, Larridon I, Livermore L, Lopes RJ, Meeus S, Miller JA, Milleville K, Panda R, Pignal M, Poelen J, Ristevski B, Robertson T, Rufino AC, Santos J, Schermer M, Scott B, Seltmann KC, Teixeira H, Trekels M, Gaikwad J (2023) Envisaging a global infrastructure to exploit the potential of digitised collections. Biodiversity Data Journal 11: e109439. https://doi.org/10.3897/BDJ.11.e109439

Digitising UK Natural History Collections is vital to understand life on Earth, reports the Natural History Museum

In a paper published in the journal Research Ideas and Outcomes, authors estimate £18 million has been saved in efficiencies by researchers accessing digital specimens rather than physical collections.

· Scientists from the Natural History Museum (NHM) deep-dive into the uses and users of natural history collections held in the UK

· Modest estimates report a saving of £18 million in efficiencies by researchers accessing digital data rather than physical collections

· Today, software can complete in a week what it would take a human two years to achieve

· Call for investment to secure the UK’s stance as a world superpower in science and tech, and for a future in which both people and planet thrive

A new report has evaluated the use and impact of digitised natural science collections held in the UK and how they contribute to scientific, commercial and societal benefits.

UK natural science collections hold more than 137 million items spanning an incredible 4.56-billion-year history of life on Earth. These collections have emerged as a pivotal data resource to understanding the Earth in its past and current state – and will continue to inform the investors and policy-makers of the future.

UK natural science data in demand

GBIF—the Global Biodiversity Information Facility—is an international database providing open access data on all types of life on Earth. In this paper led by the NHM, scientists report that there are 7.6 million specimens, less than 6% of total UK natural science collections sampled, freely accessible on GBIF.

They found that 12% of the total peer-reviewed journal articles citing GBIF data specifically cite UK natural science collections. These data currently make up just 0.3% of total occurrences on GBIF, meaning they punch an incredible 40 times above their weight.

When asked previously, over 90% of GBIF users linked their use of these data to advancing the UN Sustainable Development Goals which look to reduce hunger, poverty and inequality, and spur economic growth while tackling climate change and protecting the oceans and forests.

The case for digitising UK natural science collections

The introduction of these collections onto a digital platform has revolutionised scientific research. In this paper published in the journal Research Ideas and Outcomes, the authors estimate £18 million has been saved in efficiencies by researchers accessing digital specimens rather than physical collections, assuming a minimal single physical visit replaced per citation. Of this, £1.4 million has been attributed to UK researchers, money which can be reinvested back into UK science institutions – those at the forefront of finding solutions to real world problems.

Lead author and Deputy Head of Digital, Data and Informatics, Helen Hardy says, ‘The advancement of digitisation has been truly transformational to the scientific community. Today it’s possible to use software that takes a week to achieve the type of information gathering it would take a human over 3,000 hours, or two years, to complete – individuals realising an entire life’s work in just a few months! Anticipation is high for further innovations such as the further integration of artificial intelligence into taxonomic work.’

UK government want the UK to be a science and technology superpower, and natural science collections provide a unique opportunity to achieve this. To unlock the true potential of collections data, UK Natural Science collections are joining forces through the Distributed System of Scientific Collections UK (DiSSCo) to make the case for investment of £155 million in a research infrastructure which is expected to unlock at least a seven- to ten- fold economic return on investment. Working alongside the Arts & Humanities Research Council (AHRC) and UK Research and Innovation (UKRI) to digitise the critical mass of collections, the data will be available through a robust technological infrastructure and continually developed in line with recent innovations.

Ken Norris, Deputy Director of Science at the NHM says, ‘In the midst of a planetary emergency, and what some experts believe to be the Earth’s sixth mass extinction event, estimates say that over 50% of the world’s GDP, which equates to approx. 44 trillion dollars, is dependent on the natural world. By understanding what is in collections now, both on a national and international scale, we can identify trends, necessary actions, and what we need to collect to underpin policy and investment decisions for a future where people and planet thrive.’

Hardy H, Livermore L, Kersey P, Norris K, Smith V, Pullar J (2023) Users and uses of UK Natural History Collections – a Summary, https://doi.org/10.5281/zenodo.8403318

A longer paper on this study including further detail on the methodology and findings is also available:

Hardy H, Livermore L, Kersey P, Norris K, Smith V (2023) Understanding the users and uses of UK Natural History Collections. Research Ideas and Outcomes 8: e113378 https://doi.org/10.3897/rio.9.e113378

Photo credit: Trustees of the Natural History Museum

Follow Research Ideas and Outcomes on Facebook, Twitter, and LinkedIn.

‘Nature’s Envelope’ – a simple device that reveals the scope and scale of all biological processes

All processes fit into a broad S-shaped envelope extending from the briefest to the most enduring biological events. For the first time, we have the first simple model that depicts the scope and scale of biology.

*Arctic tern by Mark Stock, Schleswig-Holstein Wadden Sea National Park. License: CC BY-SA.*

As biology is progressing into a digital age, it is creating new opportunities for discovery.

Increasingly, information from investigations into aspects of biology from ecology to molecular biology is available in a digital form. Older ‘legacy’ information is being digitized. Together, the digital information is accumulated in databases from which it can be harvested and examined with an increasing array of algorithmic and visualization tools.

From this trend has emerged a vision that, one day, we should be able to analyze any and all aspects of biology in this digital world.
However, before this can happen, there will need to be an infrastructure that gathers information from ALL sources, reshapes it as standardized data using universal metadata and ontologies, and made freely available for analysis.

That information also must make its way to trustworthy repositories to guarantee the permanent access to the data in a polished and fully suited for re-use state.

The first layer in the infrastructure is the one that gathers all old and new information, whether it be about the migrations of ocean mammals, the sequence of bases in ribosomal RNA, or the known locations of particular species of ciliated protozoa.

How many of these subdomains will be there?
To answer this, we need to have a sense of the scope and scale of biology.

With the Nature’s Envelope we have, for the first time, a simple model that depicts the scope and scale of biology. Presented as a rhetorical device by its author Dr David J. Patterson (University of Sydney, Australia), the Nature’s Envelope is described in a Forum Paper, published in the open-science journal Research Ideas and Outcomes (RIO).

This is achieved by compiling information about the processes conducted by all living organisms. The processes occur at all levels of organization, from sub-molecular transactions, such as those that underpin nervous impulses, to those within and among plants, animals, fungi, protists and prokaryotes. Further, they are also the actions and reactions of individuals and communities; but also the sum of the interactions that make up an ecosystem; and finally, the consequences of the biosphere as a whole system.

Nature’s Envelope, in green, includes all processes carried out by, involving, or the result of the activities of any and all organisms. The axes depict the duration of events and the sizes of participants using a log₁₀ scale. *Image by David J. Patterson*. *License: CC BY*.

In the Nature’s Envelope, information on sizes of participants and durations of processes from all levels of organization are plotted on a grid. The grid uses a logarithmic (base 10) scale, which has about 21 orders of magnitude of size and 35 orders of magnitude of time. Information on processes ranging from the subatomic, through molecular, cellular, tissue, organismic, species, communities to ecosystems is assigned to the appropriate decadal blocks.

Examples include movements from the stepping motion of molecules like kinesin that move forward 8 nanometres in about 10 milliseconds; or the migrations of Arctic terns which follow routes of 30,000 km or more from Europe to Antarctica over 3 to 4 months.

The extremes of life processes are determined by the smallest and largest entities to participate, and the briefest and most enduring processes.

The briefest event to be included is the transfer of energy from a photon to a photosynthetic pigment as the photon passes through a chlorophyll molecule several nanometres in width at a speed of 300,000 km per second. That transaction is conducted in about 10^-17 seconds. As it involves the smallest subatomic particles, it defines the lower left corner of the grid.

The most enduring is the process of evolution that has been progressing for almost 4 billion years. The influence of the latter has created the biosphere (the largest living object) and affects the gas content of the atmosphere. This process established the upper right extreme of the grid.

All biological processes fit into a broad S-shaped envelope that includes about half of the decadal blocks in the grid. The envelope drawn round the initial examples is Nature’s Envelope.

“Nature’s envelope will be a useful addition to many discussions, whether they deal with the infrastructure that will manage the digital age of biology, or provide the context for education on the diversity and range of processes that living systems engage in.
The version of Nature’s Envelope published in the RIO journal is seen as a first version, to be refined and enhanced through community participation,”
comments Patterson.

***

Original source:

Patterson DJ (2022) The scope and scale of the life sciences (‘Nature’s envelope’). Research Ideas and Outcomes 8: e96132. https://doi.org/10.3897/rio.8.e96132

***

Follow Research Ideas and Outcomes (RIO Journal) on Twitter, Facebook and Linkedin.

Digitising beans to feed the world

In 2018, NHM London’s digitisation team started a project to digitise non-type herbarium material from the legume family. A recent data paper in the Biodiversity Data Journal reports on the outcomes.

You can find the original blog post by the Natural History Museum of London, reposted here with minor edits.

Legumes are a group of plants that include soybeans, peas, chickpeas, peanuts and lentils. They are a significant source of protein, fibre, carbohydrates, and minerals in our diet and some, like the cowpea, are resistant to droughts.

In 2018, the Natural History Museum of London’s (NHM London) digitisation team started a project in collaboration with project leader Royal Botanic Gardens Kew and the Royal Botanic Garden Edinburgh.

The project’s outcomes were published in a data paper in the Biodiversity Data Journal. Within the project, the digitisation team aimed to collectively digitise non-type herbarium material from the legume family. This includes rosewood trees (Dalbergia), padauk trees (Pterocarpus) and the Phaseolinae subtribe that contains many of the beans cultivated for human and animal food.

This project was made possible through the Department for Environment Food & Rural Affairs (DEFRA)-allocated Official Development Assistance (ODA) funding, distributed by the UK government in its “global efforts to defeat poverty, tackle instability and create prosperity in developing countries”.

African	Guinea, Ethiopia, Sudan, Kenya, Uganda, Tanzania, Mozambique, Malawi and Madagascar
Asian	Bangladesh, Myanmar, Nepal, New Guinea and India
Southern and Central American	Guatemala, Honduras, El Salvador, Nicaragua, Bolivia, Argentina and Brazil

ODA-listed Countries

The legume groups: Dalbergia, Pterocarpus and Phaseolinae,were chosen for digitisation to support the development of dry beans as a sustainable and resilient crop, and to aid conservation and sustainable use of rosewood and padauk trees. Some of these beans, especially cow pea and pigeon pea, are sustainable and resilient crops, as they can be grown in poor-quality soils and are drought stress resistant. This makes them particularly suitable for agricultural production where the growing of other crops would be difficult.

Digitally discoverable herbarium specimens can provide important information about the distribution of individual species, as well as highlighting which species occur naturally together.

While there have been collaborative efforts between herbaria in the past, these have tended to prioritise digitisation of type specimens: the example specimens for which a species is named.

Types are important to identification, but being individual specimens, they don’t offer insights into species distribution over time. By focusing on the non-types across the world and over the last 200 years, we have released a brand-new resource to the global scientific community.

Searching for beans

This collection was digitised by creating an inventory record for each specimen, attaching images of each herbarium sheet, and then transcribing more data and georeferencing the specimens, providing an accurate locality in space and time for their collection.

We originally had four months and three members of staff to digitise over 11,000 specimens. The Covid-19 lockdown was ironically rather lucky for this project as it enabled us to have more time to transcribe and georeference all of the records.
say the researchers behind the digitisation project.

*Map showing breakdown of records by country*.

“We were able to assign country-level data to 10,857 out of the total number of 11,222 records. We were also able to transcribe the collectors’ names from the majority of our specimen labels (10,879 out of 11,222). Only 770 out of the 2,226 individuals identified during this project collected their specimens in ODA listed countries. The highest contributors were: Richard Beddome (130 specimens), Charles Clarke (110), Hans Schlieben (98) and Nathaniel Wallich (79). The breakdown of records by ODA country can be seen in the chart below. “

Map showing breakdown of records by country and pie chart showing distribution by ODA listed countries.

From our data, we can see the peak decade of collection was the 1930s, with almost half (4,583 specimens or 49,43%) collected between 1900 and 1950 (Fig. 10).
This peak can be attributed to three of our most prolific collectors: Arthur Kerr, John Gossweiler and Georges Le Testu, all of whom were most active in the 1930s. The oldest specimen (BM013713473) was collected by Mark Catesby (1683-1749) in the Bahamas in 1726.
they explain.

An interesting, but perhaps unsurprising, finding is that our collection is strongly male-dominated.
There are only two women (Caroline Whitefoord and Ynes Mexia) in the list of our top 50 plant collectors and they are not close to the most prolific collectors.
We identified more women in the rest of our records, but their contribution is on average less than 25 specimens per person in the dataset consisting of more than 10,000 specimens. In contrast, the top five male collectors contributed 10% of our collection.
they continued

Releasing Rosewoods

Both the Pterocarpus and Dalbergia genera include species that are used as expensive good quality timber that is prone to illegal logging. Many species such as Pterocarpus tinctorius are also listed on the International Union for Conservation of Nature (IUCN) Red List of Threatened Species. By releasing this new resource of information on all these plants from three of the biggest herbaria in the world, we can share this datа with the people who are taking care of biodiversity in these countries. The data can be used to identify hotspots, where the tree is naturally growing and protect these areas. These data would also allow much closer attention to be paid to areas that could be targets for illegal logging activity.

*Pterocarpus tinctorius* is a species of padauk tree that is listed as endangered on the IUCN Red List.

Cowpea (*Vigna unguiculata*) is a food and animal feed crop grown in the semi-arid tropics.

The ODA-listed countries are economically impoverished and disproportionately prone to be disadvantaged with the changing climate whether from flood or drought or increase in temperature.
Using data to identify good, nutritious plant species that can be grown in such conditions can therefore benefit local communities, potentially reducing dependence on imports, aid and on less resilient crops.
the team adds in conclusion.

***

This dataset is now openly available on the Museum’s Data Portal and a data paper about this work has been released in the Biodiversity Data Journal.

***

Stay in touch with the Digitisation team by following us on Instagram and Twitter.

Don’t forget to also follow the Biodiversity Data Journal on Twitter and Facebook.

Digitising the Natural History Museum London’s entire collection could contribute over £2 billion to the global economy

In a world first, the Natural History Museum, London, has collaborated with economic consultants, Frontier Economics Ltd, to explore the economic and societal value of digitising natural history collections and concluded that digitisation has the potential to see a seven to tenfold return on investment. Whilst significant progress is already being made at the Museum, additional investment is needed in order to unlock the full potential of the Museum’s vast collections – more than 80 million objects. The project’s report is published in the open science scientific journal Research Ideas and Outcomes (RIO Journal).

One of the Museum’s digitisers imaging a butterfly to join the 4.93 million specimens already available online.
© The Trustees of the Natural History Museum, London

The societal benefits of digitising natural history collections extends to global advancements in food security, biodiversity conservation, medicine discovery, minerals exploration, and beyond. Brand new, rigorous economic report predicts investing in digitising natural history museum collections could also result in a tenfold return. The Natural History Museum, London, has so far made over 4.9 million digitised specimens available freely online – over 28 billion records have been downloaded over 429,000 download events over the past six years.

Digitisation at the Natural History Museum, London

Digitisation is the process of creating and sharing the data associated with Museum specimens. To digitise a specimen, all its related information is added to an online database. This typically includes where and when it was collected and who found it, and can include photographs, scans and other molecular data if available. Natural history collections are a unique record of biodiversity dating back hundreds of years, and geodiversity dating back millennia. Creating and sharing data this way enables science that would have otherwise been impossible, and we accelerate the rate at which important discoveries are made from our collections.

The Natural History Museum’s collection of 80 million items is one of the largest and most historically and geographically diverse in the world. By unlocking the collection online, the Museum provides free and open access for global researchers, scientists, artists and more. Since 2015, the Museum has made 4.9 million specimens available on the Museum’s Data Portal, which have seen more than 28 billion downloads over 427,000 download events.

This means the Museum has digitised about 6% of its collections to date. Because digitisation is expensive, costing tens of millions of pounds, it is difficult to make a case for further investment without better understanding the value of this digitisation and its benefits.

In 2021, the Museum decided to explore the economic impacts of collections data in more depth, and commissioned Frontier Economics to undertake modelling, resulting in this project report, now made publicly available in the open-science journal Research Ideas and Outcomes (RIO Journal), and confirming benefits in excess of £2 billion over 30 years. While the methods in this report are relevant to collections globally, this modelling focuses on benefits to the UK, and is intended to support the Museum’s own digitisation work, as well as a current scoping study funded by the Arts & Humanities Research Council about the case for digitising all UK natural science collections as a research infrastructure.

“Sharing data from our collections can transform scientific research and help find solutions for nature and from nature. Our digitised collections have helped establish the baseline plant biodiversity in the Amazon, find wheat crops that are more resilient to climate change and support research into potential zoonotic origins of Covid-19. The research that comes from sharing our specimens has immense potential to transform our world and help both people and the planet thrive,“
says Helen Hardy, Science Digital Programme Manager at the Natural History Museum.

How digitisation impacts scientific research?

The data from museum collections accelerates scientific research, which in turn creates benefits for society and the economy across a wide range of sectors. Frontier Economics Ltd have looked at the impact of collections data in five of these sectors: biodiversity conservation, invasive species, medicines discovery, agricultural research and development and mineral exploration.

“The Natural History Museum’s collection is a real treasure trove which, if made easily accessible to scientists all over the world through digitisation, has the potential to unlock ground-breaking research in any number of areas. Predicting exactly how the data will be used in future is clearly very uncertain. We have looked at the potential value that new research could create in just five areas focussing on a relatively narrow set of outcomes. We find that the value at stake is extremely large, running into billions,”
says Dan Popov, Economist at Frontier Economics Ltd.

The new analyses attempt to estimate the economic value of these benefits using a range of approaches, with the results in broad agreement that the benefits of digitisation are at least ten times greater than the costs. This represents a compelling case for investment in museum digital infrastructure without which the many benefits will not be realised.

“This new analysis shows that the data locked up in our collections has significant societal and economic value, but we need investment to help us release it,“
adds Professor Ken Norris, Head of the Life Sciences Department at the Natural History Museum.

Other benefits could include improvements to the resilience of agricultural crops by better understanding their wild relatives, research into invasive species which can cause significant damage to ecosystems and crops, and improving the accuracy of mining.

Finally, there are other impacts that such work could have on how science is conducted itself. The very act of digitising specimens means that researchers anywhere on the planet can access these collections, saving time and money that may have been spent as scientists travelled to see specific objects.

The value of research enabled by digitisation of natural history collections can be estimated by looking at specific areas where the Museum’s collections contribute towards scientific research and subsequently impact the wider economy.
© Frontier Economics Ltd.

Original source:

Popov D, Roychoudhury P, Hardy H, Livermore L, Norris K (2021) The Value of Digitising Natural History Collections. Research Ideas and Outcomes 7: e78844. https://doi.org/10.3897/rio.7.e78844

Artificial neural networks could power up curation of natural history collections

Deep learning techniques manage to differentiate between similar plant families with up to 99 percent accuracy, Smithsonian researchers reveal

Millions, if not billions, of specimens reside in the world’s natural history collections, but most of these have not been carefully studied, or even looked at, in decades. While containing critical data for many scientific endeavors, most objects are quietly sitting in their own little cabinets of curiosity.

Thus, mass digitization of natural history collections has become a major goal at museums around the world. Having brought together numerous biologists, curators, volunteers and citizens scientists, such initiatives have already generated large datasets from these collections and provided unprecedented insight.

Now, a study, recently published in the open access Biodiversity Data Journal, suggests that the latest advances in both digitization and machine learning might together be able to assist museum curators in their efforts to care for and learn from this incredible global resource.

A team of researchers from the Smithsonian Department of Botany, Data Science Lab, and Digitization Program Office recently collaborated with NVIDIA to carry out a pilot project using deep learning approaches to dig into digitized herbarium specimens.

*Smithsonian researchers classifying digitized herbarium sheets.*

Their study is among the first to describe the use of deep learning methods to enhance our understanding of digitized collection samples. It is also the first to demonstrate that a deep convolutional neural network–a computing system modelled after the neuron activity in animal brains that can basically learn on its own–can effectively differentiate between similar plants with an amazing accuracy of nearly 100%.

In the paper, the scientists describe two different neural networks that they trained to perform tasks on the digitized portion (currently 1.2 million specimens) of the United States National Herbarium.

The team first trained a net to automatically recognize herbarium sheets that had been stained with mercury crystals, since mercury was commonly used by some early collectors to protect the plant collections from insect damage. The second net was trained to discriminate between two families of plants that share a strikingly similar superficial appearance.

*Sample herbarium specimen image of stained clubmoss.*

The trained neural nets performed with 90% and 96% accuracy respectively (or 94% and 99% if the most challenging specimens were discarded), confirming that deep learning is a useful and important technology for the future analysis of digitized museum collections.

“The results can be leveraged both to improve curation and unlock new avenues of research,” conclude the scientists.

“This research paper is a wonderful proof of concept. We now know that we can apply machine learning to digitized natural history specimens to solve curatorial and identification problems. The future will be using these tools combined with large shared data sets to test fundamental hypotheses about the evolution and distribution of plants and animals,” says Dr. Laurence J. Dorr, Chair of the Smithsonian Department of Botany.

###

Original source:

Schuettpelz E, Frandsen P, Dikow R, Brown A, Orli S, Peters M, Metallo A, Funk V, Dorr L (2017) Applications of deep convolutional neural networks to digitized natural history collections. Biodiversity Data Journal 5: e21139. https://doi.org/10.3897/BDJ.5.e21139