In a world increasingly defined by data-driven decisions, biodiversity research stands to benefit from standardized and accessible data. Despite their importance for research, biodiversity datasets often fail to meet FAIR (Findable, Accessible, Interoperable, Reusable) standards, leading to concerns about data quality, reliability, and accessibility.
To address this, we propose a framework to retrieve, refine and align secondary biodiversity data with FAIR standards, utilizing the Darwin Core model. We followed four steps:
data localization (systematic review)
quality validation
standardization using the Darwin Core standard
sharing and archive in the appropriate repository.
Our approach integrates data validation and quality control steps to ensure that secondary data sets can be trusted.
Our study in Biodiversity Data Journalfocused on ecotonal estuarine ecosystems near the easternmost Amazon, where we recovered data from 46,000 individuals representing 3,871 taxa across eight biotic groups (birds, amphibians, reptiles, mammals, fish, phytoplankton, benthos, and plants) from 1985 to 2022. These data were used to illustrate how our strategy improves validation, making the data more reliable for macroecological modeling and conservation management. As data becomes more standardized, researchers around the world will be better equipped to collaborate, identify trends, protect ecosystems, and advance sustainability efforts.
Accessible biodiversity data empowers stakeholders and provides critical insights into ecosystem health and species conservation. However, without standardized formats, this data is often fragmented, incomplete, or difficult to compare. By creating a consistent framework for collecting, storing, and sharing data, we are opening the door to more informed decision-making and innovation in biodiversity conservation.
The key to conserving biodiversity is collaboration and transparency. By prioritizing accessible and standardized data, we ensure that vital information reaches those who need it most – whether it’s for scientific study, habitat management or policymaking.
Let’s continue to make biodiversity data a tool for global change!
Research article:
Marques N, Soares CDdeM, Casali DdeM, Guimarães E, Fava F, Abreu JMdaS, Moras L, Silva LGda, Matias R, Assis RLde, Fraga R, Almeida S, Lopes V, Oliveira V, Missagia R, Carvalho E, Carneiro N, Alves R, Souza-Filho P, Oliveira G, Miranda M, Tavares VdaC (2024) Retrieving biodiversity data from multiple sources: making secondary data standardised and accessible. Biodiversity Data Journal 12: e133775. https://doi.org/10.3897/BDJ.12.e133775
Within theBiodiversity Community Integrated Knowledge Library (BiCIKL) project, 14 European institutions from ten countries, spent the last three years elaborating on services and high-tech digital tools, in order to improve the findability, accessibility, interoperability and reusability (FAIR-ness) of various types of data about the world’s biodiversity. These types of data include peer-reviewed scientific literature, occurrence records, natural history collections, DNA data and more.
By ensuring all those data are readily available and efficiently interlinked to each other, the project consortium’s intention is to provide better tools to the scientific community, so that it can more rapidly and effectively study, assess, monitor and preserve Earth’s biological diversity in line with the objectives of the likes of the EU Biodiversity Strategy for 2030 and the European Green Deal. Their targets require openly available, precise and harmonised data to underpin the design of effective measures for restoration and conservation, reminds the BiCIKL consortium.
Since 2021, the project partners at BiCIKL have been working together to elaborate existing workflows and links, as well as create brand new ones, so that their data resources, platforms and tools can seamlessly communicate with each other, thereby taking the burden off the shoulders of scientists and letting them focus on their actual mission: paving the way to healthy and sustainable ecosystems across Europe and beyond.
Now that the three-year project is officially over, the wider scientific community is yet to reap the fruits of the consortium’s efforts. In fact, the end of the BiCIKL project marks the actual beginning of a European- and global-wide revolution in the way biodiversity scientists access, use and produce data. It is time for the research community, as well as all actors involved in the study of biodiversity and the implementation of regulations necessary to protect and preserve it, to embrace the lessons learned, adopt the good practices identified and build on the knowledge in existence.
This is why amongst the BiCIKL’s major final research outputs, there are two Policy Briefs meant to summarise and highlight important recommendations addressed to key policy makers, research institutions and funders of research. After all, it is the regulatory bodies that are best equipped to share and implement best practices and guidelines.
Most recently, the BiCIKL consortium published two particularly important policy briefs, both addressed to the likes of the European Commission’s Directorate-General for Environment; the European Environment Agency; the Joint Research Centre; as well as science and policy interface platforms, such as the EU Biodiversity Platform; and also organisations and programmes, e.g. Biodiversa+ and EuropaBON, which are engaged in biodiversity monitoring, protection and restoration. The policy briefs are also to be of particular use to national research funds in the European Union.
One of the newly published policy briefs, titled “Uniting FAIR data through interlinked, machine-actionable infrastructures”, highlights the potential benefits derived from enhanced connectivity and interoperability among various types of biodiversity data. The publication includes a list of recommendations addressed to policy-makers, as well as nine key action points. Understandably, amongst the main themes are those of wider international cooperation; inclusivity and collaboration at scale; standardisation and bringing science and policy closer to industry. Another major outcome of the BiCIKL project: the Biodiversity Knowledge Hub portal is noted as central to many of these objectives and tasks in its role of a knowledge broker that will continue to be maintained and updated with additional FAIR data-compliant services as a living legacy of the collaborative efforts at BiCIKL.
The second policy brief, titled “Liberate the power of biodiversity literature as FAIR digital objects”, shares key actions that can liberate data published in non-machine actionable formats and non-interoperable platforms, so that those data can also be efficiently accessed and used; as well as ways to publish future data according to the best FAIR and linked data practices. The recommendations highlighted in the policy brief intend to support decision-making in Europe; expedite research by making biodiversity data immediately and globally accessible; provide curated data ready to use by AI applications; and bridge gaps in the life cycle of research data through digital-born data. Several new and innovative workflows, linkages and integrative mechanisms and services developed within BiCIKL are mentioned as key advancements created to access and disseminate data available from scientific literature.
While all policy briefs and factsheets – both primarily targeted at non-expert decision-makers who play a central role in biodiversity research and conservation efforts – are openly and freely available on the project’s website, the most important contributions were published as permanent scientific records in a BiCIKL-branded dedicated collection in the peer-reviewed open-science journal Research Ideas and Outcomes (RIO). There, the policy briefs are provided as both a ready-to-print document (available as supplementary material) and an extensive academic publication.
Currently, the collection: “Towards interlinked FAIR biodiversity knowledge: The BiCIKL perspective” in the RIO journal contains 60 publications, including policy briefs, project reports, methods papers, conference abstracts, demonstrating and highlighting key milestones and project outcomes from along the BiCIKL’s journey in the last three years. The collection also features over 15 scientific publications authored by people not necessarily involved in BiCIKL, but whose research uses linked open data and tools created in BiCIKL. Their publications were published in a dedicated article collection in the Biodiversity Data Journal.
***
Visit the Biodiversity Community Integrated Knowledge Library (BiCIKL) project’s website at: https://bicikl-project.eu/.
The publications so far include the grant proposal; conference abstracts, a workshop report, guidelines papers and deliverables submitted to the Commission.
The dynamic open-science project collection of BiCIKL, titled “Towards interlinked FAIR biodiversity knowledge: The BiCIKL perspective” (doi: 10.3897/rio.coll.105), continues to grow, as the project progresses into its third year and its results accumulate ever so exponentially.
Following the publication of three important BiCIKL deliverables: the project’s Data Management Plan, its Visual identity package and a report, describing the newly built workflow and tools for data extraction, conversion and indexing and the user applications from OpenBiodiv, there are currently 30 research outcomes in the BiCIKL collection that have been shared publicly to the world, rather than merely submitted to the European Commission.
Shortly after the BiCIKL project started in 2021, a project-branded collection was launched in the open-science scholarly journal Research Ideas and Outcomes(RIO). There, the partners have been publishing – and thus preserving – conclusive research papers, as well as early and interim scientific outputs.
The publications so far also include the BiCIKL grant proposal, which earned the support of the European Commission in 2021; conference abstracts, submitted by the partners to two consecutive TDWG conferences; a project report that summarises recommendations on interoperability among infrastructures, as concluded from a hackathon organised by BiCIKL; and two Guidelines papers, aiming to trigger a culture change in the way data is shared, used and reused in the biodiversity field.
At the time of writing, the top three of the most read papers in the BiCIKL collection is completed by the grant proposal and the second Guidelines paper, where the partners – based on their extensive and versatile experience – present recommendations about the use of annotations and persistent identifiers in taxonomy and biodiversity publishing.
What one might find quite odd when browsing the BiCIKL collection is that each publication is marked with its own publication source, even though all contributions are clearly already accessible from RIO Journal.
This is because one of the unique features of RIOallows for consortia to use their project collection as a one-stop access point for all scientific results, regardless of their publication venue, by means of linking to the original source via metadata. Additionally, projects may also upload their documents in their original format and layout, thanks to the integration between RIO and ARPHA Preprints. This is in fact how BiCIKL chose to share their latest deliverables using the very same files they submitted to the Commission.
“In line with the mission of BiCIKL and our consortium’s dedication to FAIRness in science, we wanted to keep our project’s progress and results fully transparent and easily accessible and reusable to anyone, anywhere,”
explains Prof Lyubomir Penev, BiCIKL’s Project Coordinator and founder and CEO of Pensoft.
“This is why we opted to collate the outcomes of BiCIKL in one place – starting from the grant proposal itself, and then progressively adding workshop reports, recommendations, research papers and what not. By the time BiCIKL concludes, not only will we be ready to refer back to any step along the way that we have just walked together, but also rest assured that what we have achieved and learnt remains at the fingertips of those we have done it for and those who come after them,” he adds.
Key figures from Naturalis Biodiversity Center, Plazi and Pensoft were amongst the first to sign the Declaration at the closing session of the First International Conference on FAIR Digital Objects (FDO2022)
Several of the BiCIKL partners signed the Leiden Declaration on FAIR Digital Objects, thereby committing to “a new environment that works as a truly meaningful data space,” as framed by the organisers of the conference, whose first instalment turned out to be the perfect occasion for the formal publication of the pact.
The conference brought together key international technical, scientific, industry and science-policy stakeholders with the aim to boost the development and implementation of FAIR Digital Objects (FDOs) worldwide. It was organised by the FDO Forum, an initiative supported by major global initiatives and by a variety of regional and national initiatives with the shared goal to achieve a better coherence amongst the increasing number of initiatives working on FDO-based designs and implementations.
By joining the Declaration’s signees, the BiCIKL partners formally committed to:
Support the FAIR guiding principles to be applied (ultimately) to each digital object in a web of FAIR data and services;
Support open standards and protocols;
Support data and services to be as open as possible, and only as restricted as necessary;
Support distributed solutions where useful to achieve robustness and scalability, but recognise the need for centralised approaches where necessary;
Support the restriction of standards and protocols to the absolute minimum;
Support freedom to operate wherever possible;
Help to avoid monopolies and provider lock-in wherever possible.
***
During the event, Plazi and Pensoft held a presentation demonstrating how their Biodiversity Literature Repository turns taxonomic treatments ‘locked’ in legacy scientific literature into FAIR Digital Objects. As a result of the collaboration between Plazi and Pensoft – a partnership long-preceding their involvement in BiCIKL – this workflow has also been adapted to modern-day publishing, in order to FAIRify data as soon as it is published.
***
Ahead of FDO2022, all submitted conference abstracts – including the one associated with Plazi’s presentation – were made publicly available in a collection of their own in Pensoft’s open-science journal Research Ideas and Outcomes (RIO). Thus, not only did the organisers make the conference outputs available to the participants early on, so that they can familiarise themselves with the upcoming talks and topics in advance, but they also ensure that the contributions are permanently preserved and FAIR in their own turn.
The conference collection, guest edited by Tina Loo (Naturalis Biodiversity Center), contains a total of 51 conference abstracts, where each is published in HTML, XML and PDF formats, and assigned with its own persistent identifier (DOI) just like the collection in its entirety (10.3897/rio.coll.190).
***
Read more about the declaration and sign it yourself from this link. You can also follow the FDO Forum on Twitter (@FAIRDOForum).