The Biodiversity Digital Twin to help understand our planet’s life

By combining and improving digitally available data and models, BioDT offers approaches for sustainable biodiversity management and ecosystem conservation.

Biodiversity is essential for the processes that support all life on Earth. It provides critical resources such as food and energy, and supports ecosystem health. However, climate change, deforestation, and pollution are destroying habitats, altering ecosystems, and eliminating – or introducing – species that are fundamental for planet’s biosphere.

To tackle the challenges caused by environmental change and human activities on biodiversity, a consortium of 22 partners led by CSC – IT Center for Science, home of the EuroHPC LUMI supercomputer, is developing Biodiversity Digital Twins (BioDT) as a result of the European Commission’s initiative.

Cover of the “Building Biodiversity Digital Twins” article collection in RIO journal.

The BioDT project aims to revolutionise our understanding of biodiversity dynamics by integrating advanced modelling, simulation, and prediction capabilities. By combining and improving digitally available data and models, BioDT offers approaches for sustainable biodiversity management and ecosystem conservation. BioDT’s combines expertise in biodiversity, ecological modelling, FAIR data, high-performance computing, and artificial intelligence.

BioDT aims to enhance the accuracy and predictive performance of biodiversity models through iterative development and validation against independent data. This approach can be critical for developing decision support tools and policy development. By continuously updating data, BioDT will provide real-time predictions of biodiversity patterns and processes through interactive maps and summaries. The consortium leverages existing technologies and data from major research infrastructures (GBIF, eLTER, DiSSCo, and LifeWatch ERIC) to achieve this goal.

A screenshot of the BioDT homepage.

The project’s impact extends to addressing critical issues, including impact of environmental  change on species and ecosystems, food security, and the implementation of the EU and international policies. The project contributes to the UN Sustainable Development Goals 2 (Zero Hunger), 3 (Good Health and Well-being), 13 (Climate Action), and 15 (Life on Land).

BioDT develops prototype Digital Twins for biodiversity conservation

In order to test its modelling system, BioDT is developing ten prototype digital twins (pDTs) focused on species and ecosystems of high conservation and policy concern, such as invasive species, pollinators and grasslands.
The pDTs are divided into four main groups:

  • Species Response to Environmental Change: focus on the interactions between species and ecosystems. By incorporating temporal dynamics rather than pure space-for-time substitutions, BioDT improves temporal predictions and accuracy. Different sources of uncertainty are quantified using extensive geographic data combined with high-resolution time-series data in a single modelling framework.
  • Genetically Detected Biodiversity: addressing food security and challenging environments by integrating genomic methods based on DNA data with traditional biodiversity data. These twins focus on crop wild relatives and other genetic resources for farming and food security, as well as DNA-detected biodiversity in poorly known habitats.
  • Dynamics of Species of Policy Concern: applying modelling and high-performance computing to invasive and alien species recognised at EU and national levels. This twin involves using current species occurrence data, and tackling crucial environmental conditions and invasive effects on native taxa and ecosystems.
  • Influence of Species Interactions: predicting disease outbreaks using vector species and exploring the patterns and processes of insect pollinators. Work on interaction twins involves further development of data exchange models and establishing temporal historic reference points through digitisation of collection specimens.
A screenshot from the BioDT homepage showing the purposes of prototype digital twins.

The pDTs aim to make essential datasets, best practices, expertise, and lessons learned available and ready for use to researchers and research infrastructures in implementing the use cases, while providing.

The pDTs test the models predictive performance and data availability scenarios, and apply them to address biodiversity challenges through scenario simulations, predictions, and biomonitoring methods. This iterative approach aims to integrate and compare the predictive performance of various modelling approaches, stimulating the development of next-generation prototypes.

To learn more about the biodiversity pDTs, explore the dedicated pages on the BioDT website

Building Biodiversity Digital Twins: a BioDT collection of scientific papers

To further advance the development and reliability of Biodiversity Digital Twins, the BioDT team has produced 10 scientific papers, compiled in the “Building Biodiversity Digital Twins” issue of the open-science scholarly journal Research Ideas and Outcomes (RIO).

“The collection offers an in-depth understanding of the conceptual and technical advancements achieved towards developing digital twins for a wide range of biodiversity topics. Through the BioDT project, we are enabling a broad audience to interactively understand and predict biodiversity changes across space and time.” says Gabriela Zuquim, Scientific Coordinator at CSC for the BioDT project 

The collection serves as a centralised access point to project outputs by the BioDT initiative. Publication of rather unconventional and not traditionally published research outputs is in fact amongst the unique features of the open-science RIO journal. Another feature is the possibility of individual publications to be mapped to the SDGs they contribute to, thereby further underlining their significance.

A conceptual diagram of a digital twin prototype from this paper. The core aim of this project is to test the feasibility of generating essentially real-time updating predictions on bird spatiotemporal distributions and singing activity by combining prior information, based on long-term monitoring data with continuously accumulating new information provided by citizen scientists.

In the case of BioDT, RIO has made it possible for the project team to illustrate the process of prototyping Biodiversity Digital Twins in the format of a peer-reviewed scientific article, thereby ensuring its discoverability, credibility, citability, reusability and long-term public availability. By opting for this transparent approach to sharing their scientific work that has standed the rigour of formal scientific review, the BioDT project ensures that future scientists can make better and more efficient use of the models developed by the consortium’s researchers, data, and cutting-edge technology.

For example, one publication describes the HONEYBEE Prototype Digital Twin. The prototype will allow, after the ongoing calibration with land use and hive weight data,  predictions of honeybee population dynamics, mite infestation and honey production. The model was developed based on a previously developed one, devised to simulate foraging of a single bee colony. By using the prototype digital twin, users can interactively apply the model on various time and geographic scales ranging from local sites to whole regions or even country level. Thus, it can become an essential tool for the assessment of the viability and productivity of honey bee colonies around Germany, regardless of the specificity of landscapes and management strategies.

Overview of the prototype HONEYBEE-pDT

Our vision is that the assessment can even be run to take into account different climate-change scenarios. The publication also provides guidelines to potential users of the prototype. The authors of the paper, led by Dr Jürgen Groeneveld (Helmholtz Centre for Environmental Research – UFZ, Germany) reminds that despite honey bees “being a managed species, they are severely affected by climate change, emerging parasites and diseases, modern agricultural land use and possibly inappropriate beekeeping practices”, while going on to cite worrying data about the trends in both Europe and the USA. 

Similarly, other publications already available from the collection address equally crucial and pressing issues with impact on a global scale, including disease outbreaks, crop management, invasive species, bird and vegetation dynamics. 

“The Building Biodiversity Digital Twins collection of project papers suited our needs perfectly,” said Dmitry Schigel, GBIF Scientific officer and a coordinating editor of the collection. “The project team agreed to capture the project’s iterations and reveal our two-thirds stage prototypes two years into the project with one more to go. The innovative platform that the Pensoft’s RIO journal provides lets us describe our progress in a less formal but still peer-reviewed setting. Thanks to the efficient work of the author teams, reviewers and co-editors, this special issue came together quickly and now enables our prototype digital twin teams to attract and process feedback from broader audiences”

Explore the “Building Biodiversity Digital Twins” collection, freely accessible on Pensoft’s RIO Journal. Read them now and see their impact!

A new dawn for biological collections: The AI revolution in museums and herbaria

There are numerous uses for machine learning in digital collections, including an enormous potential to extract traits of organisms.

Guest blog post by Quentin Groom

Imagine having access to all the two billion biological collections of the world from your desktop! Not only to browse, but to search with artificial intelligence. We recently published a paper where we envisage what might be possible, such as searching all specimen labels for a person’s signature, studying the patterns of butterflies’ wings, or reconstructing a historic expedition.

Numbers of digital images from biodiversity collections are increasing exponentially. Herbariums have led the way with tens of millions of images available, but images of pinned insects will soon overtake plants.

Numbers of accessible images of specimens are increasing exponentially. Plants lead the way, but insects are increasing at the fastest rate. This graph was created from snapshots of the Global Biodiversity Information Facility and is undoubtedly an underestimate of the actual number of specimens for which images exist. See how this was created in Groom et al. (2023).

At one time, if you wanted access to biological collections, you had to travel. Now we are used to visiting collections online, where we can view images of specimens and their details on our desktops. Nevertheless, biological collection images are still dispersed and this limits their effective use, not just for people, but also for computers. One of the promises of making specimens digital is being able to apply machine learning to these images.  Yet the real benefits of machine access to specimens can only be realised through massive access to collection images and the ability to apply these techniques to hundreds of collections and millions of specimens.

Imagine examining collections globally for the variation and evolution of wing coloration in butterflies, or studying the size and shape of leaves in research that transverses habitats and gradients of latitude and altitude.

In our paper in Biodiversity Data Journal, we examined some of the numerous uses for machine learning in digital collections. These include an enormous potential to extract traits of organisms, from the size and shape of different organs, to their colours, patterns, and phenology. Imagine examining collections globally for the variation and evolution of wing coloration in butterflies, or studying the size and shape of leaves in research that transverses habitats and gradients of latitude and altitude. We would not only be able to study the intricacies of evolution, but also practical subjects, such as the mechanics of pollination in insects, adaptations to drought in plants, and adaptations to weediness in invasive species.

Machine access to these images will also provide an unparalleled view of the history of the biological sciences, the specimens used to describe species, the evidence for evolution, the people involved and institutions that contributed. Such transparency may reveal some amazing stories of scientific exploration, but will undoubtedly also shed light on some of the less exemplary actions of colonialism. Yet if we are to redress the injustices of the past we need to have a balanced view of collections, and we should do this openly.

Specimen labels provide numerous clues to their history often in the form of stamps and emblems. A BR0000013433048 Meise Botanic Garden (CC-BY-SA 4.0). B USCH0030719, A.C. Moore Herbarium at the University of South Carolina (public domain). C E00809288, Royal Botanic Garden Edinburgh (public domain). D USCH0030719, University of South Carolina (public domain). E E00919066, Royal Botanic Garden Edinburgh (public domain). F BR0000017682725, Meise Botanic Garden (CC-BY-SA 4.0). G P00605317, Museum National d’Histoire Naturelle, Paris (CC-BY 4.0). H LISC036829, Instituto de Investigação Científica Tropical (CC-BY-NC 4.0). l PC0702930, Muséum National d’Histoire Naturelle, Paris (CC-By 4.0). J same specimen as (B). K PC0702930 Muséum National d’Histoire Naturelle, Paris (CC-BY 4.0). L 101178648, Missouri Botanical Garden (CC-BY-SA 4.0).

With such unparalleled access to collections, we could travel vicariously to times and places that are hard to reach in any other way. Fieldwork is expensive and time-consuming, and can’t provide the historic perspective of collections, let alone the geographic extent. Furthermore, digital resources have the potential to democratise collections, allowing anyone the opportunity to study these collections irrespective of location.

Is such a vision of integrated digital collections possible? It certainly is! The technologies already exist, not just for machine learning, but also to create the infrastructure to provide access to millions of digital images and their metadata. Initiatives, such as DiSSCo in Europe and iDigBio in the USA are moving in this direction. Yet, we conclude that the main challenge to realising this vision of the future is a sociopolitical one. Can so many institutions and funders work together to pool their resources? Can collections in rich countries share the sovereignty of their collections with the countries where many of the specimens originated?

If you too share the dream, we encourage you to support or contribute to initiatives working in this direction, whether through funding, collaboration, or sharing knowledge. If the full potential of digital collections is to be realised, we need to think big and work together.

Research article:

Groom Q, Dillen M, Addink W, Ariño AHH, Bölling C, Bonnet P, Cecchi L, Ellwood ER, Figueira R, Gagnier P-Y, Grace OM, Güntsch A, Hardy H, Huybrechts P, Hyam R, Joly AAJ, Kommineni VK, Larridon I, Livermore L, Lopes RJ, Meeus S, Miller JA, Milleville K, Panda R, Pignal M, Poelen J, Ristevski B, Robertson T, Rufino AC, Santos J, Schermer M, Scott B, Seltmann KC, Teixeira H, Trekels M, Gaikwad J (2023) Envisaging a global infrastructure to exploit the potential of digitised collections. Biodiversity Data Journal 11: e109439. https://doi.org/10.3897/BDJ.11.e109439

Invasive alien species? Isn’t there an app for that?

Scientists review 41 invasive species reporting apps and provide recommendations for future development.

Invasive alien species (IAS) are a leading contributor to biodiversity loss, and they cause annual economic damage in the order of hundreds of billions of US dollars in each of many countries around the world. Smartphone apps are one relatively new tool that could help monitor, predict, and ideally prevent their spread. But are they living up to their full potential?

A team of researchers from the University of Montana, the Flathead Lake Biological Station and the University of Georgia River Basin Center tried to answer that in a recent research paper in the open access, peer-reviewed journal NeoBiota. Going through nearly 500 peer-reviewed articles, they identified the key features of the perfect IAS reporting app and then rated all known English-language IAS reporting apps available to North America users against this ideal.

Smartphone apps have the potential to be powerful reporting tools. Citizen scientists the world around have made major contributions to the reporting of biodiversity using apps like iNaturalist and eBird. But apps for reporting invasive species never reached that level of popularity; Howard and his team investigated why.

Smartphone apps like the soon-to-be-released new EDDmapS platform are promising tools for monitoring, predicting, and reducing the spread of invasive species. However, the same explosion of reports has not been realized as that which has been experienced by biodiversity-wide platforms. Howard et al. investigate why there has not been the same boom in use observed for these invasive species-specific apps. Image by Leif Howard and Charles van Rees

User uptake and retention are just as important as collecting data. Howard and colleagues found that apps tend to do a good job with one of these, and rarely with both. In their paper, they emphasize that making apps user-friendly and fun to use, involving games and useful functions like species identification and social media plug-ins is a major missing piece among current apps.

“The greatest advancement in IAS early detection would likely result from app gamification,” they write.

Another feature they would like to see more of is artificial intelligence or machine learning for photo identification, which they believe would greatly enhance species identification and might increase public participation.

The authors also make suggestions for future innovations that could make IAS reporting apps even more effective. Their biggest suggestion is coordination. 

“Currently, most invasive species apps are developed by many separate organizations, leading to duplicated effort and inconsistent implementation”, they say. “The valuable data collected by these apps is also sent to different databases, making it harder for scientists to combine them for useful research.”

A more efficient way to implement these technologies might be providing open-source code and app templates, with which local organizations can make regional apps that contribute data to centralized databases. 

Overall, this research shows how with broader participation, more complete and informative reporting forms, and more consistent and structured data management, IAS reporting apps could make much larger contributions to invasive species management worldwide. This, in turn, could save local, regional, and national economies hundreds of millions or billions of dollars annually, while protecting valuable ecological and agricultural systems for future generations.

Research article:

Howard L, van Rees C, Dahquist Z, Luikart G, Hand B (2022) A review of invasive species reporting apps for citizen science and opportunities for innovation. NeoBiota 71: 165-188. https://doi.org/10.3897/neobiota.71.79597

Follow NeoBiota on Twitter and Facebook.

Data mining applied to scholarly publications to finally reveal Earth’s biodiversity

At a time when a million species are at risk of extinction, according to a recent UN report, ironically, we don’t know how many species there are on Earth, nor have we noted down all those that we have come to know on a single list. In fact, we don’t even know how many species we would have put on such a list.

The combined research including over 2,000 natural history institutions worldwide, produced an estimated ~500 million pages of scholarly publications and tens of millions of illustrations and species descriptions, comprising all we currently know about the diversity of life. However, most of it isn’t digitally accessible. Even if it were digital, our current publishing systems wouldn’t be able to keep up, given that there are about 50 species described as new to science every day, with all of these published in plain text and PDF format, where the data cannot be mined by machines, thereby requiring a human to extract them. Furthermore, those publications would often appear in subscription (closed access) journals.

The Biodiversity Literature Repository (BLR), a joint project ofPlaziPensoft and Zenodo at CERN, takes on the challenge to open up the access to the data trapped in scientific publications, and find out how many species we know so far, what are their most important characteristics (also referred to as descriptions or taxonomic treatments), and how they look on various images. To do so, BLR uses highly standardised formats and terminology, typical for scientific publications, to discover and extract data from text written primarily for human consumption.

By relying on state-of-the-art data mining algorithms, BLR allows for the detection, extraction and enrichment of data, including DNA sequences, specimen collecting data or related descriptions, as well as providing implicit links to their sources: collections, repositories etc. As a result, BLR is the world’s largest public domain database of taxonomic treatments, images and associated original publications.

Once the data are available, they are immediately distributed to global biodiversity platforms, such as GBIF–the Global Biodiversity Information Facility. As of now, there are about 42,000 species, whose original scientific descriptions are only accessible because of BLR.

The very basic principle in science to cite previous information allows us to trace back the history of a particular species, to understand how the knowledge about it grew over time, and even whether and how its name has changed through the years. As a result, this service is one avenue to uncover the catalogue of life by means of simple lookups.

So far, the lessons learned have led to the development of TaxPub, an extension of the United States National Library of Medicine Journal Tag Suite and its application in a new class of 26 scientific journals. As a result, the data associated with articles in these journals are machine-accessible from the beginning of the publishing process. Thus, as soon as the paper comes out, the data are automatically added to GBIF.

While BLR is expected to open up millions of scientific illustrations and descriptions, the system is unique in that it makes all the extracted data findable, accessible, interoperable and reusable (FAIR), as well as open to anybody, anywhere, at any time. Most of all, its purpose is to create a novel way to access scientific literature.

To date, BLR has extracted ~350,000 taxonomic treatments and ~200,000 figures from over 38,000 publications. This includes the descriptions of 55,800 new species, 3,744 new genera, and 28 new families. BLR has contributed to the discovery of over 30% of the ~17,000 species described annually.

Prof. Lyubomir Penev, founder and CEO of Pensoft says,

“It is such a great satisfaction to see how the development process of the TaxPub standard, started by Plazi some 15 years ago and implemented as a routine publishing workflow at Pensoft’s journals in 2010, has now resulted in an entire infrastructure that allows automated extraction and distribution of biodiversity data from various journals across the globe. With the recent announcement from the Consortium of European Taxonomic Facilities (CETAF) that their European Journal of Taxonomy is joining the TaxPub club, we are even more confident that we are paving the right way to fully grasping the dimensions of the world’s biodiversity.”

Dr Donat Agosti, co-founder and president of Plazi, adds:

“Finally, information technology allows us to create a comprehensive, extended catalogue of life and bring to light this huge corpus of cultural and scientific heritage – the description of life on Earth – for everybody. The nature of taxonomic treatments as a network of citations and syntheses of what scientists have discovered about a species allows us to link distinct fields such as genomics and taxonomy to specimens in natural history museums.”

Dr Tim Smith, Head of Collaboration, Devices and Applications Group at CERN, comments:

“Moving the focus away from the papers, where concepts are communicated, to the concepts themselves is a hugely significant step. It enables BLR to offer a unique new interconnected view of the species of our world, where the taxonomic treatments, their provenance, histories and their illustrations are all linked, accessible and findable. This is inspirational for the digital liberation of other fields of study!”

###

Additional information:

BLR is a joint project led by Plazi in partnership with Pensoft and Zenodo at CERN.

Currently, BLR is supported by a grant from Arcadia, a charitable fund of Lisbet Rausing and Peter Baldwin.

Scientists use forensic technology to genetically document infanticide in brown bears

Modern open-source software helped the researchers identify the male that killed a female and her two cubs

Scientists used a technology designed for the purposes of human forensics, to provide the first genetically documented case of infanticide in brown bears, following the murder of a female and her two cubs in Trentino, the Italian Alps, where a small re-introduced population has been genetically monitored for already 20 years.

The study, conducted and authored by Francesca Davoli, The Italian Institute for Environmental Protection and Research (ISPRA), Bologna, and her team, is published in the open access journal Nature Conservation.

To secure their own reproduction, males of some social mammalian species, such as lions and bears, exhibit infanticidal behaviour where they kill the offspring of their competitors, so that they can mate with the females which become fertile again soon after they lose their cubs. However, sometimes females are also killed while trying to protect their young, resulting in a survival threat to small populations and endangered species.

“In isolated populations with a small number of reproductive adults, sexually selected infanticide can negatively impact the long-term conservation of the species, especially in the case where the female is killed while protecting her cubs,” point out the researchers.

“Taking this into account, the genetic identification of the perpetrators could give concrete indications for the management of small populations, for example, placing radio-collars on infanticidal males to track them,” they add. “Nevertheless, genetic studies for identifying infanticidal males have received little attention.”

Thanks to a database containing the genotypes of all bears known to inhabit the study site and an open-source software used to analyse human forensic genetic profiles, the scientists were able to solve the case much like in a television crime series.

orsa occultata - leggeraUpon finding the three corpses, the researchers were certain that the animals had not been killed by a human. In the beginning, the suspects were all male brown bears reported from the area in 2015.

Hoping to isolate the DNA of the perpetrator, the researchers collected three samples of hairs and swabbed the female’s wounds in search for saliva. Dealing with a relatively small population, the scientists expected that the animals would share a genotype to an extent, meaning they needed plenty of samples.

However, while the DNA retrieved from the saliva swabs did point to an adult male, at first glance it seemed that it belonged to the cubs’ father. Later, the scientists puzzled out that the attacker must have injured the cubs and the mother alternately, thus spreading blood containing the inherited genetic material from the father bear. Previous knowledge also excluded the father, since there are no known cases of male bears killing their offspring. In fact, they seem to distinguish their own younglings, even though they most likely recognise the mother.

To successfully determine the attacker, the scientists had to use the very small amount of genetic material from the saliva swabs they managed to collect and conduct a highly sophisticated analysis, in order to obtain four genetic profiles largely overlapping with each other. Then, they compared them against each of the males reported from the area that year. Eventually, they narrowed down the options to an individual listed as M7.

“The monitoring of litters is a fundamental tool for the management of bear populations: it has allowed the authors to genetically confirm the existence of cases of infanticide and in the future may facilitate the retrieval of information necessary to assess the impact of SSI on demographic trends,” conclude the researchers.

###

Original source:

Davoli F, Cozzo M, Angeli F, Groff C, Randi E (2018) Infanticide in brown bear: a case-study in the Italian Alps – Genetic identification of perpetrator and implications in small populations. Nature Conservation 25: 55-75. https://doi.org/10.3897/natureconservation.25.23776