Better data practices advance biodiversity knowledge

A framework to retrieve, refine and align secondary biodiversity data with FAIR standards.

Guest blog post by Nubia Marques et al.

In a world increasingly defined by data-driven decisions, biodiversity research stands to benefit from standardized and accessible data. Despite their importance for research, biodiversity datasets often fail to meet FAIR (Findable, Accessible, Interoperable, Reusable) standards, leading to concerns about data quality, reliability, and accessibility.

To address this, we propose a framework to retrieve, refine and align secondary biodiversity data with FAIR standards, utilizing the Darwin Core model. We followed four steps:

  1. data localization (systematic review)
  2. quality validation
  3. standardization using the Darwin Core standard
  4. sharing and archive in the appropriate repository.

Our approach integrates data validation and quality control steps to ensure that secondary data sets can be trusted.

Our study in Biodiversity Data Journal focused on ecotonal estuarine ecosystems near the easternmost Amazon, where we recovered data from 46,000 individuals representing 3,871 taxa across eight biotic groups (birds, amphibians, reptiles, mammals, fish, phytoplankton, benthos, and plants) from 1985 to 2022. These data were used to illustrate how our strategy improves validation, making the data more reliable for macroecological modeling and conservation management. As data becomes more standardized, researchers around the world will be better equipped to collaborate, identify trends, protect ecosystems, and advance sustainability efforts.

Relationships between numbers of taxa and occurrences gathered through an extensive review of secondary biodiversity data from the Golfão Maranhense area, in the estuarine regions of eastern Amazonia.

Accessible biodiversity data empowers stakeholders and provides critical insights into ecosystem health and species conservation. However, without standardized formats, this data is often fragmented, incomplete, or difficult to compare. By creating a consistent framework for collecting, storing, and sharing data, we are opening the door to more informed decision-making and innovation in biodiversity conservation.

The key to conserving biodiversity is collaboration and transparency. By prioritizing accessible and standardized data, we ensure that vital information reaches those who need it most – whether it’s for scientific study, habitat management or policymaking.

Let’s continue to make biodiversity data a tool for global change!

Research article:

Marques N, Soares CDdeM, Casali DdeM, Guimarães E, Fava F, Abreu JMdaS, Moras L, Silva LGda, Matias R, Assis RLde, Fraga R, Almeida S, Lopes V, Oliveira V, Missagia R, Carvalho E, Carneiro N, Alves R, Souza-Filho P, Oliveira G, Miranda M, Tavares VdaC (2024) Retrieving biodiversity data from multiple sources: making secondary data standardised and accessible. Biodiversity Data Journal 12: e133775. https://doi.org/10.3897/BDJ.12.e133775

How to ensure biodiversity data are FAIR, linked, open and future-proof?

Now concluded Horizon 2020-funded project BiCIKL shares lessons learned with policy-makers and research funders

Within the Biodiversity Community Integrated Knowledge Library (BiCIKL) project, 14 European institutions from ten countries, spent the last three years elaborating on services and high-tech digital tools, in order to improve the findability, accessibility, interoperability and reusability (FAIR-ness) of various types of data about the world’s biodiversity. These types of data include peer-reviewed scientific literature, occurrence records, natural history collections, DNA data and more.

By ensuring all those data are readily available and efficiently interlinked to each other, the project consortium’s intention is to provide better tools to the scientific community, so that it can more rapidly and effectively study, assess, monitor and preserve Earth’s biological diversity in line with the objectives of the likes of the EU Biodiversity Strategy for 2030 and the European Green Deal. Their targets require openly available, precise and harmonised data to underpin the design of effective measures for restoration and conservation, reminds the BiCIKL consortium.

Since 2021, the project partners at BiCIKL have been working together to elaborate existing workflows and links, as well as create brand new ones, so that their data resources, platforms and tools can seamlessly communicate with each other, thereby taking the burden off the shoulders of scientists and letting them focus on their actual mission: paving the way to healthy and sustainable ecosystems across Europe and beyond.

Now that the three-year project is officially over, the wider scientific community is yet to reap the fruits of the consortium’s efforts. In fact, the end of the BiCIKL project marks the actual beginning of a European- and global-wide revolution in the way biodiversity scientists access, use and produce data. It is time for the research community, as well as all actors involved in the study of biodiversity and the implementation of regulations necessary to protect and preserve it, to embrace the lessons learned, adopt the good practices identified and build on the knowledge in existence.

This is why amongst the BiCIKL’s major final research outputs, there are two Policy Briefs meant to summarise and highlight important recommendations addressed to key policy makers, research institutions and funders of research. After all, it is the regulatory bodies that are best equipped to share and implement best practices and guidelines.

Most recently, the BiCIKL consortium published two particularly important policy briefs, both addressed to the likes of the European Commission’s Directorate-General for Environment; the European Environment Agency; the Joint Research Centre; as well as science and policy interface platforms, such as the EU Biodiversity Platform; and also organisations and programmes, e.g. Biodiversa+ and EuropaBON, which are engaged in biodiversity monitoring, protection and restoration. The policy briefs are also to be of particular use to national research funds in the European Union.

One of the newly published policy briefs, titled “Uniting FAIR data through interlinked, machine-actionable infrastructures”, highlights the potential benefits derived from enhanced connectivity and interoperability among various types of biodiversity data. The publication includes a list of recommendations addressed to policy-makers, as well as nine key action points. Understandably, amongst the main themes are those of wider international cooperation; inclusivity and collaboration at scale; standardisation and bringing science and policy closer to industry. Another major outcome of the BiCIKL project: the Biodiversity Knowledge Hub portal is noted as central to many of these objectives and tasks in its role of a knowledge broker that will continue to be maintained and updated with additional FAIR data-compliant services as a living legacy of the collaborative efforts at BiCIKL.

The second policy brief, titled “Liberate the power of biodiversity literature as FAIR digital objects”, shares key actions that can liberate data published in non-machine actionable formats and non-interoperable platforms, so that those data can also be efficiently accessed and used; as well as ways to publish future data according to the best FAIR and linked data practices. The recommendations highlighted in the policy brief intend to support decision-making in Europe; expedite research by making biodiversity data immediately and globally accessible; provide curated data ready to use by AI applications; and bridge gaps in the life cycle of research data through digital-born data. Several new and innovative workflows, linkages and integrative mechanisms and services developed within BiCIKL are mentioned as key advancements created to access and disseminate data available from scientific literature. 

While all policy briefs and factsheets – both primarily targeted at non-expert decision-makers who play a central role in biodiversity research and conservation efforts – are openly and freely available on the project’s website, the most important contributions were published as permanent scientific records in a BiCIKL-branded dedicated collection in the peer-reviewed open-science journal Research Ideas and Outcomes (RIO). There, the policy briefs are provided as both a ready-to-print document (available as supplementary material) and an extensive academic publication.

Currently, the collection: “Towards interlinked FAIR biodiversity knowledge: The BiCIKL perspective” in the RIO journal contains 60 publications, including policy briefs, project reports, methods papers, conference abstracts, demonstrating and highlighting key milestones and project outcomes from along the BiCIKL’s journey in the last three years. The collection also features over 15 scientific publications authored by people not necessarily involved in BiCIKL, but whose research uses linked open data and tools created in BiCIKL. Their publications were published in a dedicated article collection in the Biodiversity Data Journal.

***

Visit the Biodiversity Community Integrated Knowledge Library (BiCIKL) project’s website at: https://bicikl-project.eu/.

Don’t forget to also explore the Biodiversity Knowledge Hub (BKH) for yourself at: https://biodiversityknowledgehub.eu/ and watch the BKH’s introduction video

Highlights from the BiCIKL project are also accessible on Twitter/X from the project’s hashtag: #BiCIKL_H2020 and handle: @BiCIKL_H2020.

BiCIKL keeps on adding project outcomes in own collection in RIO Journal

The publications so far include the grant proposal; conference abstracts, a workshop report, guidelines papers and deliverables submitted to the Commission.

The dynamic open-science project collection of BiCIKL, titled “Towards interlinked FAIR biodiversity knowledge: The BiCIKL perspective” (doi: 10.3897/rio.coll.105), continues to grow, as the project progresses into its third year and its results accumulate ever so exponentially. 

Following the publication of three important BiCIKL deliverables: the project’s Data Management Plan, its Visual identity package and a report, describing the newly built workflow and tools for data extraction, conversion and indexing and the user applications from OpenBiodiv, there are currently 30 research outcomes in the BiCIKL collection that have been shared publicly to the world, rather than merely submitted to the European Commission.

Shortly after the BiCIKL project started in 2021, a project-branded collection was launched in the open-science scholarly journal Research Ideas and Outcomes (RIO). There, the partners have been publishing – and thus preserving – conclusive research papers, as well as early and interim scientific outputs.

The publications so far also include the BiCIKL grant proposal, which earned the support of the European Commission in 2021; conference abstracts, submitted by the partners to two consecutive TDWG conferences; a project report that summarises recommendations on interoperability among infrastructures, as concluded from a hackathon organised by BiCIKL; and two Guidelines papers, aiming to trigger a culture change in the way data is shared, used and reused in the biodiversity field. 

In fact, one of the Guidelines papers, where representatives of the Consortium of European Taxonomic Facilities (CETAF), the Society for the Preservation of Natural History Collections (SPNHC) and the Biodiversity Heritage Library (BHL) came together to publish their joint statement on best practices for the citation of authorities of scientific names, has so far generated about 4,000 views by nearly 3,000 unique readers.

At the time of writing, the top three of the most read papers in the BiCIKL collection is completed by the grant proposal and the second Guidelines paper, where the partners – based on their extensive and versatile experience – present recommendations about the use of annotations and persistent identifiers in taxonomy and biodiversity publishing. 

Access to data and services along the entire data and research life cycle in biodiversity science.
The figure was featured in the BiCIKL grant proposal, now made available from the BiCIKL project collection in RIO Journal.

What one might find quite odd when browsing the BiCIKL collection is that each publication is marked with its own publication source, even though all contributions are clearly already accessible from RIO Journal

So, we can see many project outputs marked as RIO publications, but also others that have been published in the likes of F1000Research, the official journal of TDWG: Biodiversity Information Science and Standards, and even preprints servers, such as BiohackrXiv

This is because one of the unique features of RIO allows for consortia to use their project collection as a one-stop access point for all scientific results, regardless of their publication venue, by means of linking to the original source via metadata. Additionally, projects may also upload their documents in their original format and layout, thanks to the integration between RIO and ARPHA Preprints. This is in fact how BiCIKL chose to share their latest deliverables using the very same files they submitted to the Commission.

“In line with the mission of BiCIKL and our consortium’s dedication to FAIRness in science, we wanted to keep our project’s progress and results fully transparent and easily accessible and reusable to anyone, anywhere,” 

explains Prof Lyubomir Penev, BiCIKL’s Project Coordinator and founder and CEO of Pensoft. 

“This is why we opted to collate the outcomes of BiCIKL in one place – starting from the grant proposal itself, and then progressively adding workshop reports, recommendations, research papers and what not. By the time BiCIKL concludes, not only will we be ready to refer back to any step along the way that we have just walked together, but also rest assured that what we have achieved and learnt remains at the fingertips of those we have done it for and those who come after them,” he adds.

***

You can keep tabs on the BiCIKL project collection in RIO Journal by subscribing to the journal newsletter or following @RIOJournal on Twitter and Facebook.

BiCIKL partners sign the Leiden Declaration on FAIR Digital Objects

Key figures from Naturalis Biodiversity Center, Plazi and Pensoft were amongst the first to sign the Declaration at the closing session of the First International Conference on FAIR Digital Objects (FDO2022)

Several of the BiCIKL partners signed the Leiden Declaration on FAIR Digital Objects, thereby committing to “a new environment that works as a truly meaningful data space,” as framed by the organisers of the conference, whose first instalment turned out to be the perfect occasion for the formal publication of the pact. 

Key figures from Naturalis Biodiversity Center, Plazi and Pensoft were amongst the first to sign the Declaration at the closing session of the First International Conference on FAIR Digital Objects (FDO2022), which took place in October 2022 in Leiden, the Netherlands, where it was hosted by the Naturalis Biodiversity Center.

***

The conference brought together key international technical, scientific, industry and science-policy stakeholders with the aim to boost the development and implementation of FAIR Digital Objects (FDOs) worldwide. It was organised by the FDO Forum, an initiative supported by major global initiatives and by a variety of regional and national initiatives with the shared goal to achieve a better coherence amongst the increasing number of initiatives working on FDO-based designs and implementations.  

By joining the Declaration’s signees, the BiCIKL partners formally committed to:

  • Support the FAIR guiding principles to be applied (ultimately) to each digital object in a web of FAIR data and services;  
  • Support open standards and protocols;
  • Support data and services to be as open as possible, and only as restricted as necessary;
  • Support distributed solutions where useful to achieve robustness and scalability, but recognise the need for centralised approaches where necessary;
    • Support the restriction of standards and protocols to the absolute minimum;
    • Support freedom to operate wherever possible;
    • Help to avoid monopolies and provider lock-in wherever possible.

***

During the event, Plazi and Pensoft held a presentation demonstrating how their Biodiversity Literature Repository turns taxonomic treatments ‘locked’ in legacy scientific literature into FAIR Digital Objects. As a result of the collaboration between Plazi and Pensoft – a partnership long-preceding their involvement in BiCIKL – this workflow has also been adapted to modern-day publishing, in order to FAIRify data as soon as it is published.

A slide from the Plazi presentation at the FDO2022, Leiden, the Netherlands. Credit: Plazi.

***

Ahead of FDO2022, all submitted conference abstracts – including the one associated with Plazi’s presentation – were made publicly available in a collection of their own in Pensoft’s open-science journal Research Ideas and Outcomes (RIO). Thus, not only did the organisers make the conference outputs available to the participants early on, so that they can familiarise themselves with the upcoming talks and topics in advance, but they also ensure that the contributions are permanently preserved and FAIR in their own turn. 

The conference collection, guest edited by Tina Loo (Naturalis Biodiversity Center), contains a total of 51 conference abstracts, where each is published in HTML, XML and PDF formats, and assigned with its own persistent identifier (DOI) just like the collection in its entirety (10.3897/rio.coll.190).

***

Read more about the declaration and sign it yourself from this link. You can also follow the FDO Forum on Twitter (@FAIRDOForum).