Mining nature’s knowledge: turning text into data

By using natural language processing, researchers created a reliable system that can automatically read and pull useful data from thousands of articles.

Guest blog post by Joseph Cornelius, Harald Detering, Oscar Lithgow-Serrano, Donat Agosti, Fabio Rinaldi, and Robert M Waterhouse

In a groundbreaking new study, scientists are using powerful computer tools to gather key information about arthropods—creatures like insects, spiders, and crustaceans—from the large and growing collection of scientific papers. The research focuses on finding details in published texts about how these animals live and interact with their environment. By using natural language processing (a type of artificial intelligence that helps computers understand human language), the team created a reliable system that can automatically read and pull useful data from thousands of articles. This innovative method not only helps us learn more about the variety of life on Earth, but also supports efforts to solve environmental challenges by making it easier to access important biological information.

Illustration depicting species literature feeding data on arthropod traits into a database, linking researchers and the community.
Mining the literature to identify species, their traits, and associated values.

The challenge

Scientific literature contains vast amounts of essential data about species—like what arthropods eat, where they live, and how big they are. However, this information is often trapped in hard-to-access files and old publications, making large-scale analysis almost impossible. So how can we convert these pages into usable data?

The goal

The team set out to develop an automatic text‑mining system using Natural Language Processing (NLP) and machine learning to scan thousands of biology papers and extract structured information about insects and other arthropods to build a database linking species names with traits like “leg length” or “forest habitat” or “predator”.

How it works in practice

  1. Collect curated vocabularies of terms to be searched for in the texts:
  • ~1 million species names from the Catalogue of Life
  • 390 traits, categorised into feeding ecology, habitat, and morphology 
  1. Create “Gold‑standard” data needed to train language models:
  • Experts manually annotated 25 papers—labelling species, traits, values, and their links—to use as a training benchmark
  1. Train NLP models so they “learn” which are the terms of interest:
  • Named‑Entity Recognition using BioBERT for identifying species, trait, and value words or phrases in the texts
  • Relation Extraction using LUKE to link the words/phrases e.g. “this species has this trait” and “this trait has this value” 
  1. Automated extraction of words/phrases and their links:
  • Processed 2,000 open‑access papers from PubMed Central
  • Identified ~656,000 entities (species, traits, values) and ~339,000 links between them 
  1. Publish results in an open searchable online resource:
  • Developed ArTraDB, an interactive web database where users can search, view, and visualise species‑trait pairs and full species‑trait‑value triples
Text-mining is a conceptually and computationally challenging task.

What is needed for the next steps

  • Annotation complexity: Even experts struggled to agree on boundaries and precise relationships, underscoring the need for clearer guidelines and more training examples to improve the performance of the models
  • Gaps in the vocabularies of terms: Many were unrecognised due to missing synonyms, outdated species names, and variations in phrasing. Expanding vocabularies will help improve the ability to find the species, traits, and values
  • Community curation: Planned features in ArTraDB will allow scientists and citizen curators to improve annotations, helping retrain and refine the models over time

How it impacts science

  • Speeds up research: Scientists can find species‑trait data quickly and accurately, boosting studies in ecology, evolution, and biodiversity
  • Scale and scope: This semi‑automated method can eventually be extended well beyond arthropods to other species
  • Supports global biodiversity efforts: Enables creation of large, quantitative trait datasets essential for monitoring ecosystem changes, climate impact, and conservation strategies
Illustration of a butterfly with icons and arrows outlining key biological data: barcode, genome, distribution, nutrition, habitat, and more.
A long-term vision to connect species with knowledge about their biology.

The outcomes

This innovative work demonstrates how combining text mining, expert curation, and interactive databases can unlock centuries of biological research. It lays a scalable foundation for building robust, open-access trait databases—empowering both scientists and the public to explore the living world in unprecedented ways.

Research article:

Cornelius J, Detering H, Lithgow-Serrano O, Agosti D, Rinaldi F, Waterhouse R (2025) From literature to biodiversity data: mining arthropod organismal traits with machine learning. Biodiversity Data Journal 13: e153070. https://doi.org/10.3897/BDJ.13.e153070

Bulgaria joins the Global Biodiversity Information Facility (GBIF) 

Led by Pensoft and its CEO Prof. Lyubomir Penev, the partnership marks a major step for Bulgarian science and regional biodiversity leadership.

Bulgaria officially joins the Global Biodiversity Information Facility (GBIF). This major event for Bulgarian science was initiated by a memorandum signed by the Minister of Environment and Water: Manol Genov. 

Logo for the Global Biodiversity Information Facility (GBIF) featuring stylized green leaves and the acronym "GBIF" in bold text.

GBIF is an international network and data infrastructure funded by governments around the world that provides international open access to a modern and comprehensive database of all species of living organisms on the planet. 

Joining GBIF is an important step for initiatives such as the Bulgarian Barcode of Life (BgBOL), as it will facilitate the integration of genetic data on species diversity into the global scientific community and support the creation of a more accurate and accessible bioinformatic database. This will increase the scientific visibility and relevance of Bulgarian efforts in molecular taxonomy and conservation.

World map showing GBIF network participants: green for voting participants, blue for associate participants, gray for non-participants.
Prof. Lyubomir Penev

“First of all, I’d like to congratulate all fellow scientists working in the domain of biology and ecology in Bulgaria with this wonderful achievement,” says Prof. Dr. Lyubomir Penev, founder and CEO of the scientific publisher and technology provider Pensoft, as well as a key participant in the talks and preparations for Bulgaria’s joining GBIF. He is also Chair of BgBOL.

“Becoming a full member of GBIF has been a long-anticipated milestone we have discussed and worked on for several years. Coming not long after we initiated the Bulgarian Barcode of Life, the Bulgarian membership in GBIF gives us yet another uncontested evidence that the nation is on the right path to preserving our uniquely rich fauna and flora,” he adds.

Pensoft is looking forward to sharing our know-how with Bulgarian institutions and scientists in order to streamline the visibility and overall efficiency of biodiversity data collected from Bulgaria.

Prof. Lyubomir Penev

“As close partners of GBIF for over 15 years now, Pensoft is looking forward to sharing our know-how with Bulgarian institutions and scientists, so that they can fully utilise the GBIF infrastructure and tools, in order to streamline the visibility and overall efficiency of biodiversity data collected from Bulgaria.”

GBIF is managed by a Secretariat based in Copenhagen and brings together countries and organisations that collaborate through national and institutional coordinators (also called participant nodes). The mechanism provides common standards, good practices and open access tools for institutions around the world to share information on the location and recording of species and specimens. According to GBIF, a total of 107 countries and organisations currently participate in the network, a significant number of which are European.

The GBIF network, as screenshot from https://www.gbif.org/the-gbif-network on 10/06/2025.

By joining GBIF, biodiversity data generated in Bulgaria can be streamlined through the network’s infrastructure so that the country does not need to build and maintain its own separate infrastructure, which also saves significant financial resources.

As a full voting member, Bulgaria will ensure that biodiversity data in the country will be shared and accessible through the platform, and will contribute to global knowledge on biodiversity, respectively to the solutions that will promote its conservation and sustainable use.

Map of Bulgaria showing biodiversity data with orange heatmap indicating occurrences.
Bulgaria’s page on GBIF, as screenshot from https://www.gbif.org/country/BG/summary on 10/06/2025.

Improvements in data management by Bulgaria will also contribute to better reporting and fulfilment of obligations to the Convention on Biological Diversity (CBD) as well as to the Intergovernmental Platform on Biodiversity and Ecosystem Services (IPBES). As a member of GBIF, Bulgaria will be able to apply for funding for flagship activities in Bulgarian institutions and neighbouring Balkan countries. This will enable the country to expand its leadership role in the Balkans in biodiversity research and data accumulation.

The partnership between GBIF and Pensoft dates back to 2009 when the global network and the publisher signed their first Memorandum of Understanding intended to solidify their cooperation as leaders in the technological advancement relevant to biodiversity knowledge. Over the next few years, Pensoft integrated its whole biodiversity journal portfolio with the GBIF infrastructure to enable multiple automated workflows, including export of all species occurrence data published in scientific articles straight to the GBIF platform. Most recently, over 20 biodiversity journals powered by Pensoft’s scholarly publishing platform ARPHA launched their own hosted portals on GBIF to make it easier to access and use biodiversity data associated with published research, aligning with principles of Findable, Accessible, Interoperable, and Reusable (FAIR) data.

Journals published on ARPHA now archived in the Biodiversity Heritage Library

To date, the content available on BHL includes 16,000 legacy articles and also extends to future articles.

Content from more than 30 biodiversity journals published on the ARPHA Platform will now be archived in the Biodiversity Heritage Library (BHL), the world’s largest open-access digital library for biodiversity literature and archives.

A vibrant orange butterfly perched on yellow flowers, with text announcing journal archiving in the Biodiversity Heritage Library.

A global consortium of natural history, botanical, research, and national libraries, BHL digitises and freely shares essential biodiversity materials. A critical resource for researchers, it provides vital access to material that might otherwise be difficult to obtain.

Under the agreement, over 16,000 articles published on Pensoft’s self-developed ARPHA Platform are now available on BHL. Both legacy content and new articles are made available on the platform, complete with full-text PDFs and all relevant metadata.

Thanks to this integration, content in our journals will become even more accessible and readily discoverable, helping researchers find the biodiversity information they need.

Prof. Lyubomir Penev

More content published on ARPHA will gradually be added to the BHL archive.

The publications will be included in the Library’s full-text search, allowing researchers to easily locate relevant biodiversity literature. Crucially, the scientific names within the articles will be indexed using the Global Names Architecture, enabling seamless discovery of information about specific taxa across the BHL collection.

This automated workflow is facilitated by the ARPHA platform and uses the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to enable exposure and harvesting of repository metadata. 

“Pensoft is pleased to collaborate with BHL in our joint mission to support global biodiversity research through free access to knowledge. Thanks to this integration, content in our journals will become even more accessible and readily discoverable, helping researchers find the biodiversity information they need,” said Prof. Lyubomir Penev, CEO and founder of Pensoft and ARPHA.

The news comes soon after BHL announced it is about to face a major shift in its operation. From 2026, the Smithsonian Institution – one of BHL’s 10 founding members – will cease to host the administrative and technical components of BHL. As the consortium explores a range of options, the BHL team is confident that “the transition opens the door to a reimagined and more sustainable future for BHL.”

Biodiversity Knowledge Hub makes an appearance at the European Geosciences Union General Assembly 2025

The Biodiversity Knowledge Hub fosters interoperability between diverse resources to make it easier to use and combine information.

Gabriel Peoluze (LifeWatch ERIC) presents the Biodiversity Knowledge Hub poster at EGU 2025
(Vienna, Austria).

On Monday, 28 April, the first day of the European Geosciences Union General Assembly 2025 (EGU 2025), participants had the chance to discover one of the most promising initiatives in biodiversity informatics: the Biodiversity Knowledge Hub (BKH). BKH was presented as part of a dedicated poster session, titled “Biodiversity Knowledge Hub: Addressing the impacts of environmental change by linking Research Infrastructures, Global Aggregators and Community Networks“.

Understanding and addressing the impacts of environmental change on biodiversity and ecosystems demands access to reliable FAIR data (as in Findable, Accessible, Interoperable, Reusable). However, the current landscape is often fragmented, making it difficult to combine and use these resources effectively.

Enter the Horizon-funded project Biodiversity Community Integrated Knowledge Library (BiCIKL): a pioneering initiative that demonstrates the transformative power of interdisciplinary collaboration. Coordinated by Pensoft, BiCIKL ran between 2021 and 2024.

The Vision of BiCIKL

Within BiCIKL, 14 European institutions from ten countries teamed up with the aim to integrate biodiversity data across research infrastructures, scientific repositories, and expert communities.

Through this integration, BiCIKL bridged the gap between isolated knowledge systems and delivered actionable insights to guide conservation and resilience efforts. The project embodies the principles of open science by demonstrating how interdisciplinary collaboration can turn fragmented data into cohesive, usable knowledge for researchers, policymakers, and practitioners.

The Biodiversity Knowledge Hub

At the heart of BiCIKL’s success is the Biodiversity Knowledge Hub (BKH): an innovative platform that provides seamless access to biodiversity data, tools, and workflows. The BKH fosters interoperability between diverse resources, thus making it easier to combine information from different sources. Whether for advanced research analytics or policymaking in support of sustainable development, the BKH empowers users with tools tailored to their needs.

A few of the standout features of the BKH include:
  • Modular design to allow continuous expansion and adaptability to new challenges in biodiversity and climate resilience
  • Interoperable systems that connect a variety of databases, repositories, and services to deliver integrated knowledge.
  • Community building by welcoming a broad network of stakeholders to ensure the platform’s long-term sustainability and growth.
Watch the Biodiversity Knowledge Hub video on YouTube.
Setting a New Benchmark in Biodiversity Informatics

Through its collaborative approach, BiCIKL set a new standard for how biodiversity and climate resilience initiatives can be harmonised globally. By showcasing best practices in data integration, capacity building, and stakeholder engagement, BiCIKL became much more than a project: it turned into a blueprint for future biodiversity knowledge infrastructures.

The Biodiversity Knowledge Hub serves to demonstrate how harmonised standards and active collaboration are key to unlocking the full potential of biodiversity data. In doing so, its mission is to create scalable, long-term solutions that are crucial for addressing today’s pressing environmental challenges.

The poster presentation at EGU25 outlined the methodologies and technologies driving the BKH, emphasizing its role as a pioneering model for integrated biodiversity knowledge and action. As environmental pressures continue to mount, the work of BiCIKL and the Biodiversity Knowledge Hub offers a hopeful path forward—one where knowledge flows freely, collaborations flourish, and data-driven solutions guide our way to a more resilient future.

Visit the Biodiversity Community Integrated Knowledge Library (BiCIKL) project’s website at: https://bicikl-project.eu/.

Don’t forget to also explore the Biodiversity Knowledge Hub (BKH) for yourself at: https://biodiversityknowledgehub.eu/ and watch the BKH’s introduction video

Revisit highlights from the BiCIKL project on X/Twitter from the project’s hashtag: #BiCIKL_H2020 and handle: @BiCIKL_H2020.

Pensoft joins the Biodiversity Meets Data Horizon project to support biodiversity monitoring and conservation

As part of the new consortium, Pensoft is to use innovative communication tools in support of evidence-based biodiversity conservation across Europe.

The European Union (EU) has been working to protect nature for decades, with the Natura 2000 network now safeguarding over 18% of EU land and 9% of its marine territory. Yet, biodiversity is still in trouble, with only 50% of bird species and 15% of habitats in good conservation status. 

To turn the tide, the EU’s Biodiversity Strategy for 2030 will expand the existing Natura 2000 areas, implement the EU’s first-ever Nature Restoration Law, and introduce concrete measures to achieve global biodiversity targets. Success will depend on enhancing biodiversity monitoring, making better use of data and gaining a clearer picture of how nature is changing.

Addressing this urgent challenge, the EU Horizon project BMD (abbreviated for Biodiversity Meets Data) will offer a centralised platform (Single Access Point or SAP) for improved biodiversity monitoring across Europe. 

Pensoft’s role

Pensoft will play a role in Biodiversity Meets Data’s impact by planning and implementing the communication, dissemination and exploitation of project results, as well as helping with the training and capacity building for BMD’s end-users, which will be led by LifeWatch ERIC. Pensoft will adopt a multi-format approach to knowledge transfer with tailored outputs for the scientific community, decision-makers, industry representatives and the general public. 

Furthermore, the BMD SAP will also incorporate elements of the Biodiversity Knowledge Hub (BKH), developed under the BiCIKL project, coordinated by Pensoft.

“It’s incredibly rewarding to see the continuity in our projects, with the legacy of the BiCIKL project continuing with Biodiversity Meets Data. This seamless progression not only builds on our past successes but also ensures that our work continues to deliver long-lasting value to the biodiversity community.”

said Prof. Dr. Lyubomir Penev, CEO and Founder of Pensoft, and project coordinator of BiCIKL (abbreviated from Biodiversity Community Integrated Knowledge Library).
The BMD project consortium at the project’s kick-off meeting in early March 2025 (Leiden, the Netherlands).
International consortium

Coordinated by Naturalis Biodiversity Center, the project brings together 14 partner organisations from 11 countries to develop innovative solutions for biodiversity management.

  1. Naturalis Biodiversity Center – the Netherlands
  2. Royal Botanic Garden Edinburgh – the United Kingdom
  3. Meise Botanic Garden – Belgium
  4. Helmholtz Centre for Environmental Research – Germany
  5. e-Science European Infrastructure for Biodiversity and Ecosystem Research – Spain
  6. Pensoft Publishers – Bulgaria
  7. The European Land Conservation Network – the Netherlands
  8. University of Tartu – Estonia
  9. Stichting Catalogue of Life – the Netherlands
  10. The International Hellenic University – Greece
  11. The Senckenberg Nature Research Society – Germany
  12. The Environment Agency Austria – Austria
  13. The National Research Council – Italy
  14. SIB Swiss Institute of Bioinformatics – Switzerland
For more information:

Visit the BMD project website at https://bmd-project.eu/, and make sure to follow the project’s progress via our social media channels on Bluesky and Linkedin.

More than 20 journals published by Pensoft with their own hosted data portals on GBIF to streamline and FAIR-ify biodiversity research

The portals currently host data on over 1,000 datasets and almost 325,000 occurrence records across the 25 journals.

In collaboration with the Global Biodiversity Information Facility (GBIF), Pensoft has established hosted data portals for 25 open-access peer-reviewed journals published on the ARPHA Platform.

A screenshot featuring a close-up of a turtle on a forest floor, overlayed with a web portal design for biodiversity data browsing.
A screenshot of the Check List data portal.

The initiative aims to make it easier to access and use biodiversity data associated with published research, aligning with principles of Findable, Accessible, Interoperable, and Reusable (FAIR) data.

The data portals offer seamless integration of published articles and associated data elements with GBIF-mediated records. Now, researchers, educators, and conservation practitioners can discover and use the extensive species occurrence and other data associated with the papers published in each journal.

A video displaying an interactive map with occurrence data on the BDJ portal.

The collaboration between Pensoft and GBIF was recently piloted with the Biodiversity Data Journal (BDJ). Today, the BDJ hosted portal provides seamless access and exploration for nearly 300,000 occurrences of biological organisms from all over the world that have been extracted from the journal’s all-time publications. In addition, the portal provides direct access to more than 800 datasets published alongside papers in BDJ, as well as to almost 1,000 citations of the journal articles associated with those publications.  

“The release of the BDJ portal and subsequent ones planned for other Pensoft journals should inspire other publishers to follow suit in advancing a more interconnected, open and accessible ecosystem for biodiversity research,” said Dr. Vince Smith, Editor-in-Chief of BDJ and head of digital, data and informatics at the Natural History Museum, London.

“The programme will provide a scalable solution for more than thirty of the journals we publish thanks to our partnership with Plazi, and will foster greater connectivity between scientific research and the evidence that supports it,” said Prof. Lyubomir Penev, founder and chief executive officer of Pensoft.

On the new portals, users can search data, refining their queries based on various criteria such as taxonomic classification, and conservation status. They also have access to statistical information about the hosted data.

Together, the hosted portals provide data on almost 325,000 occurrence records, as well as over 1,000 datasets published across the journals.

The Biodiversity Data Journal launches its own data portal on GBIF

With this simple website designed to lower technical demands, data managers and other stakeholders can easily focus on data exploration and reuse.

The Biodiversity Data Journal (BDJ) became the second open-access peer-reviewed scholarly title to make use of the hosted portals service provided by the Global Biodiversity Information Facility (GBIF): an international network and data infrastructure aimed at providing anyone, anywhere, open access to data about all types of life on Earth. 

The Biodiversity Data Journal portal, hosted on the GBIF platform, is to support biodiversity data use and engagement at national, institutional, regional and thematic scales by facilitating access and reuse of data by users with various expertise in data use and management. 

Having piloted the GBIF hosted portal solution with arguably the most revolutionary biodiversity journal in its exclusively open-access scholarly portfolio, Pensoft is to soon replicate the effort with at least 20 other journals in the field. This would mean that the publisher will more than double the number of the currently existing GBIF-hosted portals.

As of the time of writing, the BDJ portal provides seamless access and exploration for nearly 300,000 occurrences of biological organisms from all over the world that have been extracted from the journal’s all-time publications. In addition, the portal provides direct access to more than 800 datasets published alongside papers in BDJ, as well as to almost 1,000 citations of the journal articles associated with those publications.  

The release of the BDJ portal should inspire other publishers to follow suit in advancing a more interconnected, open and accessible ecosystem for biodiversity research

Vince Smith

Using the search categories featured in the portal, users can narrow their query by geography, location, taxon, IUCN Global Red List Category, geological context and many others. The dashboard also lets users access multiple statistics about the data, and even explore potentially related records with the help of the clustering feature (e.g. a specimen sequenced by another institution or type material deposited at different institutions). Additionally, the BDJ portal provides basic information about the journal itself and links to the news section from its website. 

A video displaying an interactive map with occurrence data on the BDJ portal.

Launched in 2013 with the aim to bring together openly available data and narrative into a peer-reviewed scholarly paper, the Biodiversity Data Journal has remained at the forefront of scholarly publishing in the field of biodiversity research. Over the years, it has been amongst the first to adopt many novelties developed by Pensoft, including the entirely XML-based ARPHA Writing Tool (AWT) that has underpinned the journal’s submission and review process for several years now. Besides the convenience of an entirely online authoring environment, AWT provides multiple integrations with key databases, such as GBIF and BOLD, to allow direct export and import at the authoring stage, thereby further facilitating the publication and dissemination of biodiversity data. More recently, BDJ also piloted the “Nanopublications for Biodiversity” workflow and format as a novel solution to future-proof biodiversity knowledge by sharing “pixels” of machine-actionable scientific statements.   

“I am thrilled to see the Biodiversity Data Journal’s (BDJ) hosted portal active, ten years since it became the first journal to submit taxon treatments and Darwin Core occurrence records automatically to GBIF! Since its launch in 2013, BDJ has been unrivalled amongst taxonomy and biodiversity journals in its unique workflows that provide authors with import and export functions for structured biodiversity data to/from GBIF, BOLD, iDigBio and more. I am also glad to announce that more than 30 Pensoft biodiversity journals will soon be present as separate hosted portals on GBIF thanks to our long-time collaboration with Plazi, ensuring proper publication, dissemination and re-use of FAIR biodiversity data,” said Prof. Dr. Lyubomir Penev, founder and CEO of Pensoft, and founding editor of BDJ.

“The release of the BDJ portal and subsequent ones planned for other Pensoft journals should inspire other publishers to follow suit in advancing a more interconnected, open and accessible ecosystem for biodiversity research,” said Vince Smith, editor-in-chief of BDJ and head of digital, data and informatics at the Natural History Museum, London.

Better data practices advance biodiversity knowledge

A framework to retrieve, refine and align secondary biodiversity data with FAIR standards.

Guest blog post by Nubia Marques et al.

In a world increasingly defined by data-driven decisions, biodiversity research stands to benefit from standardized and accessible data. Despite their importance for research, biodiversity datasets often fail to meet FAIR (Findable, Accessible, Interoperable, Reusable) standards, leading to concerns about data quality, reliability, and accessibility.

To address this, we propose a framework to retrieve, refine and align secondary biodiversity data with FAIR standards, utilizing the Darwin Core model. We followed four steps:

  1. data localization (systematic review)
  2. quality validation
  3. standardization using the Darwin Core standard
  4. sharing and archive in the appropriate repository.

Our approach integrates data validation and quality control steps to ensure that secondary data sets can be trusted.

Our study in Biodiversity Data Journal focused on ecotonal estuarine ecosystems near the easternmost Amazon, where we recovered data from 46,000 individuals representing 3,871 taxa across eight biotic groups (birds, amphibians, reptiles, mammals, fish, phytoplankton, benthos, and plants) from 1985 to 2022. These data were used to illustrate how our strategy improves validation, making the data more reliable for macroecological modeling and conservation management. As data becomes more standardized, researchers around the world will be better equipped to collaborate, identify trends, protect ecosystems, and advance sustainability efforts.

Relationships between numbers of taxa and occurrences gathered through an extensive review of secondary biodiversity data from the Golfão Maranhense area, in the estuarine regions of eastern Amazonia.

Accessible biodiversity data empowers stakeholders and provides critical insights into ecosystem health and species conservation. However, without standardized formats, this data is often fragmented, incomplete, or difficult to compare. By creating a consistent framework for collecting, storing, and sharing data, we are opening the door to more informed decision-making and innovation in biodiversity conservation.

The key to conserving biodiversity is collaboration and transparency. By prioritizing accessible and standardized data, we ensure that vital information reaches those who need it most – whether it’s for scientific study, habitat management or policymaking.

Let’s continue to make biodiversity data a tool for global change!

Research article:

Marques N, Soares CDdeM, Casali DdeM, Guimarães E, Fava F, Abreu JMdaS, Moras L, Silva LGda, Matias R, Assis RLde, Fraga R, Almeida S, Lopes V, Oliveira V, Missagia R, Carvalho E, Carneiro N, Alves R, Souza-Filho P, Oliveira G, Miranda M, Tavares VdaC (2024) Retrieving biodiversity data from multiple sources: making secondary data standardised and accessible. Biodiversity Data Journal 12: e133775. https://doi.org/10.3897/BDJ.12.e133775

MAkiNg Technology work for moNitoring polliNAtors: Pensoft joins ANTENNA

Pensoft is to maximise the project’s impact by informing stakeholders about results and raising public awareness about pollinators.

Pensoft joins the newly funded Biodiversa+ project ANTENNA focused on making technology work for monitoring pollinators and is tasked with the communication, dissemination and exploitation activities. 

The overarching goal of ANTENNA is to fill key monitoring gaps through advancing innovative technologies that will underpin and complement EU-wide pollinator monitoring schemes, and to provide tested transnational pipelines from monitoring activities to curated datasets and enhanced indicators that support pollinator-relevant policy and end-users.

The ANTENNA project answers the BiodivMon call, which was launched in September 2022 by Biodiversa+ in collaboration with the European Commission. The BiodivMon call sought proposals for three-year research projects to improve transnational monitoring of biodiversity and ecosystem change, emphasising innovation and harmonisation of biodiversity data collection and management methodologies, addressing knowledge gaps on biodiversity status and trends to combat biodiversity loss, and the effective use of existing biodiversity monitoring data. 

Supporting the work of Work Package #5: “Project coordination, and communication”, Pensoft is dedicated to maximising the project’s impact by employing a mix of channels to inform stakeholders about the results from ANTENNA and raise public awareness about pollinators.

Pensoft is also tasked with creating and maintaining a clear and recognisable project brand, promotional materials, website, social network profiles, internal communication platform, and online libraries. Another key responsibility is the development, implementation and regular updates of the project’s communication, dissemination and exploitation plans, that ANTENNA is set to follow for the next four years.

On 14-15 March 2024, ANTENNA held its official kick off meeting. Project partners came together in Halle, Germany for two days to outline objectives, discuss strategies, and set the groundwork for this venture.

Specifically, the combined expertise of the consortium will address the following objectives:

  1. Advance automated sample sorting and image recognition tools from individual prototypes to systems that can be adopted by practitioners
  2. Expand pollinator monitoring to under-researched pollinator taxa, ecosystems, and pressures
  3. Quantify the added value of novel monitoring systems in comparison and combination with ‘traditional’ methods in terms of cost effectiveness
  4. Provide a framework for integrative monitoring by combining multiple data streams and. The framework will also support the development of near real-time forecasting models as bases for early warning systems;
  5. Upscale local demonstrations into the implementation of large-scale transnational pipelines and provide context-specific guidance to the use of policy-makers and other users who might need to select monitoring methods and indicators.

Consortium*:

  1. Helmholtz-Centre for Environmental Research (UFZ), Germany
  2. Naturalis Biodiversity Center, Netherlands
  3. Aarhus University, Denmark
  4. Consejo Superior de Investigaciones Científicas (CSIC), Spain
  5. University of the Aegean, Greece
  6. Universidad Politécnica de Madrid, Spain
  7. Trinity College Dublin, Ireland

*Pensoft Publishers is a subcontractor tasked by the UFZ with multiple communication, dissemination and exploitation activities as part of Work Package 5.


Stay up to date with the ANTENNA project’s progress on X/Twitter (@ANTENNA_project) and LinkedIn (/antenna-project).

Brand new computer language describes organismal traits to create computable species descriptions

Describing traits with Phenoscript is like programming a computer code for how an organism looks.

The beetle species Grebennikovius basilewskyi. Numbers next to arrows indicate patterns of phenotype statements explained in the section “Phenoscript: main patterns of phenotype statements”. Arrow numbers from T1 to T5 illustrate individual body parts. See more in the research study.

One of the most beautiful aspects of Nature is the endless variety of shapes, colours and behaviours exhibited by organisms. These traits help organisms survive and find mates, like how a male peacock’s colourful tail attracts females or his wings allow him to fly away from danger. Understanding traits is crucial for biologists, who study them to learn how organisms evolve and adapt to different environments.

To do this, scientists first need to describe these traits in words, like saying a peacock’s tail is “vibrant, iridescent, and ornate”. This approach works for small studies, but when looking at hundreds or even millions of different animals or plants, it’s impossible for the human brain to keep track of everything.

Computers could help, but not even the latest AI technology is able to grasp human language to the extent needed by biologists. This hampers research significantly because, although scientists can handle large volumes of DNA data, linking this information to physical traits is still very difficult.

To solve this problem, researchers from the Finnish Museum of Natural History, Giulio Montanaro and Sergei Tarasov, along with collaborators, have created a special language called Phenoscript. This language is designed to describe traits in a way that both humans and computers can understand. Describing traits with Phenoscript is like programming a computer code for how an organism looks.

Phenoscript uses something called semantic technology, which helps computers understand the meaning behind words, much like how modern search engines know the difference between the fruit “apple” and the tech company “Apple” based on the context of your search.

“This language is still being tested, but it shows a lot of promise. As more scientists start using Phenoscript, it will revolutionise biology by making vast amounts of trait data available for large-scale studies, boosting the emerging field of phenomics,”

explains Montanaro.

In their research article, newly published in the open-access, peer-reviewed Biodiversity Data Journal, the researchers make use of the new language for the first time, as they create semantic phenotypes for four species of dung beetles from the genus Grebennikovius. Then, to demonstrate the power of the semantic approach, they apply simple semantic queries to the generated phenotypic descriptions. 

Finally, the team takes a look yet further ahead into modernising the way scientists work with species information. Their next aim is to integrate semantic species descriptions with the concept of nanopublications, “which encapsulates discrete pieces of information into a comprehensive knowledge graph”. As a result, data that has become part of this graph can be queried directly, thereby ensuring that it remains Findable, Accessible, Interoperable and Reusable (FAIR) through a variety of semantic resources.

***

Research paper:

Montanaro G, Balhoff JP, Girón JC, Söderholm M, Tarasov S (2024) Computable species descriptions and nanopublications: applying ontology-based technologies to dung beetles (Coleoptera, Scarabaeinae). Biodiversity Data Journal 12: e121562. https://doi.org/10.3897/BDJ.12.e121562

***

The hereby study is the latest addition to the special topical collection: “Linking FAIR biodiversity data through publications: The BiCIKL approach”, launched and supported by the recently concluded Horizon 2020 project: Biodiversity Community Integrated Knowledge Library (BiCIKL). The collection aims to bring together scientific publications that demonstrate the advantages and novel approaches in accessing and (re-)using linked biodiversity data.

***

What expert recommendations did the BiCIKL consortium give to policy makers and research funders to ensure that biodiversity data is FAIR, linked, open and, indeed, future-proof? Find out in the blog post summarising key lessons learnt from the Horizon 2020 project.

***

Follow Biodiversity Data Journal on Facebook and X.