How to ensure biodiversity data are FAIR, linked, open and future-proof?

Now concluded Horizon 2020-funded project BiCIKL shares lessons learned with policy-makers and research funders

Within the Biodiversity Community Integrated Knowledge Library (BiCIKL) project, 14 European institutions from ten countries, spent the last three years elaborating on services and high-tech digital tools, in order to improve the findability, accessibility, interoperability and reusability (FAIR-ness) of various types of data about the world’s biodiversity. These types of data include peer-reviewed scientific literature, occurrence records, natural history collections, DNA data and more.

By ensuring all those data are readily available and efficiently interlinked to each other, the project consortium’s intention is to provide better tools to the scientific community, so that it can more rapidly and effectively study, assess, monitor and preserve Earth’s biological diversity in line with the objectives of the likes of the EU Biodiversity Strategy for 2030 and the European Green Deal. Their targets require openly available, precise and harmonised data to underpin the design of effective measures for restoration and conservation, reminds the BiCIKL consortium.

Since 2021, the project partners at BiCIKL have been working together to elaborate existing workflows and links, as well as create brand new ones, so that their data resources, platforms and tools can seamlessly communicate with each other, thereby taking the burden off the shoulders of scientists and letting them focus on their actual mission: paving the way to healthy and sustainable ecosystems across Europe and beyond.

Now that the three-year project is officially over, the wider scientific community is yet to reap the fruits of the consortium’s efforts. In fact, the end of the BiCIKL project marks the actual beginning of a European- and global-wide revolution in the way biodiversity scientists access, use and produce data. It is time for the research community, as well as all actors involved in the study of biodiversity and the implementation of regulations necessary to protect and preserve it, to embrace the lessons learned, adopt the good practices identified and build on the knowledge in existence.

This is why amongst the BiCIKL’s major final research outputs, there are two Policy Briefs meant to summarise and highlight important recommendations addressed to key policy makers, research institutions and funders of research. After all, it is the regulatory bodies that are best equipped to share and implement best practices and guidelines.

Most recently, the BiCIKL consortium published two particularly important policy briefs, both addressed to the likes of the European Commission’s Directorate-General for Environment; the European Environment Agency; the Joint Research Centre; as well as science and policy interface platforms, such as the EU Biodiversity Platform; and also organisations and programmes, e.g. Biodiversa+ and EuropaBON, which are engaged in biodiversity monitoring, protection and restoration. The policy briefs are also to be of particular use to national research funds in the European Union.

One of the newly published policy briefs, titled “Uniting FAIR data through interlinked, machine-actionable infrastructures”, highlights the potential benefits derived from enhanced connectivity and interoperability among various types of biodiversity data. The publication includes a list of recommendations addressed to policy-makers, as well as nine key action points. Understandably, amongst the main themes are those of wider international cooperation; inclusivity and collaboration at scale; standardisation and bringing science and policy closer to industry. Another major outcome of the BiCIKL project: the Biodiversity Knowledge Hub portal is noted as central to many of these objectives and tasks in its role of a knowledge broker that will continue to be maintained and updated with additional FAIR data-compliant services as a living legacy of the collaborative efforts at BiCIKL.

The second policy brief, titled “Liberate the power of biodiversity literature as FAIR digital objects”, shares key actions that can liberate data published in non-machine actionable formats and non-interoperable platforms, so that those data can also be efficiently accessed and used; as well as ways to publish future data according to the best FAIR and linked data practices. The recommendations highlighted in the policy brief intend to support decision-making in Europe; expedite research by making biodiversity data immediately and globally accessible; provide curated data ready to use by AI applications; and bridge gaps in the life cycle of research data through digital-born data. Several new and innovative workflows, linkages and integrative mechanisms and services developed within BiCIKL are mentioned as key advancements created to access and disseminate data available from scientific literature. 

While all policy briefs and factsheets – both primarily targeted at non-expert decision-makers who play a central role in biodiversity research and conservation efforts – are openly and freely available on the project’s website, the most important contributions were published as permanent scientific records in a BiCIKL-branded dedicated collection in the peer-reviewed open-science journal Research Ideas and Outcomes (RIO). There, the policy briefs are provided as both a ready-to-print document (available as supplementary material) and an extensive academic publication.

Currently, the collection: “Towards interlinked FAIR biodiversity knowledge: The BiCIKL perspective” in the RIO journal contains 60 publications, including policy briefs, project reports, methods papers, conference abstracts, demonstrating and highlighting key milestones and project outcomes from along the BiCIKL’s journey in the last three years. The collection also features over 15 scientific publications authored by people not necessarily involved in BiCIKL, but whose research uses linked open data and tools created in BiCIKL. Their publications were published in a dedicated article collection in the Biodiversity Data Journal.

***

Visit the Biodiversity Community Integrated Knowledge Library (BiCIKL) project’s website at: https://bicikl-project.eu/.

Don’t forget to also explore the Biodiversity Knowledge Hub (BKH) for yourself at: https://biodiversityknowledgehub.eu/ and watch the BKH’s introduction video

Highlights from the BiCIKL project are also accessible on Twitter/X from the project’s hashtag: #BiCIKL_H2020 and handle: @BiCIKL_H2020.

One Ecosystem selected for inclusion in the Web of Science

“Not only does it mean that content is persistent in merit and quality, but that innovative research outputs are already appreciated within academia,” says Editor-in-Chief Prof Dr Benjamin Burkhard

Seven years after its official launch in May 2016, the One Ecosystem journal has successfully completed the rigorous quality and integrity assessment at Web of Science.

Scientific papers published in One Ecosystem from 2021 onwards will be indexed at the Emerging Sources Citation Index (ESCI) and the Journal Citation Reports (JCR), revealed the Indexing team at ARPHA Platform.

The news means that One Ecosystem might see its very first Journal Impact Factor (JIF) as early as 2024, following the latest revision of the metric’s policies Clarivate announced last July. According to the update, all journals from the Web of Science Core Collection are now featured in the Journal Citation Reports, and thereby eligible for a JIF.

“Giving all quality journals a Journal Impact Factor will provide full transparency to articles and citations that have contributed to impact, and therefore will help them demonstrate their value to the research community. This decision is aligned to our position that publications in all quality journals, not just highly cited journals, should be eligible for inclusion in research assessment exercises,” said back then Dr Nandita Quaderi, Editor-in-Chief and Editorial Vice President at Web of Science.

“We are happy to learn that Web of Science has recognised the value and integrity of One Ecosystem in the scholarly landscape. Not only does it mean that the scientific content One Ecosystem has been publishing over the years is persistent in merit and quality, but that innovative research outputs are already widely accepted and appreciated within academia.

After all, one of the reasons why we launched One Ecosystem and why it has grown to be particularly distinguished in the field of ecology and sustainability is that it provides a scholarly publication venue for traditional research papers, as well as ‘unconventional’ scientific contributions,”

comments Prof Dr Benjamin Burkhard, Executive Director at the Institute of Physical Geography & Landscape EcologyLeibniz University Hannover (Germany) and founding Editor-in-Chief of One Ecosystem.

“These ‘unconventional’ research outputs – like software descriptions, ecosystem inventories, ecosystem service mappings and monitoring schema – do not normally see the light of day, let alone the formal publication and efficient visibility. We believe that these outputs can be very useful to researchers, as well as practitioners and public bodies in charge of, for example, setting up indicator frameworks for environmental reporting,”

says Prof Davide Geneletti, Department of Civil, Environmental and Mechanical Engineering of University of Trento, Italy, and Deputy Editor-in-Chief of One Ecosystem.

“In fact, last year, we also launched a new article type: the Ecosystem Accounting table, which follows the standards set by the the System of Environmental-Economic Accounting Ecosystem Accounting (SEEA EA). This publication type provides scientists and statisticians with a platform to publish newly compiled accounting tables,” 

adds Dr Joachim Maes, Policy analyst at the Directorate-General for Regional and Urban Policy of the European Commission and Deputy Editor-in-Chief of One Ecosystem.

***

Previously, One Ecosystem has been accepted for indexing at over 60 major academic databases, including ScopusDOAJCabell’s DirectoryCABI and ERIH PLUS. In June 2022, the journal received a Scopus CiteScore reading 7.0, which placed it in Q1 in five categories: Earth and Planetary Sciences; Ecology; Nature and Landscape Conservation; Agricultural and Biological Sciences (miscellaneous); Ecology, Evolution, Behavior and Systematics.

***

You can follow One Ecosystem on Twitter and Facebook.

BiCIKL Project supports article collection in Biodiversity Data Journal about use of linked data

Welcomed are taxonomic and other biodiversity-related research articles, which demonstrate the advantages and novel approaches in accessing and (re-)using linked biodiversity data

The EU-funded project BiCIKL (Biodiversity Community Integrated Knowledge Library) will support free of charge publications* submitted to the dedicated topical collection: “Linking FAIR biodiversity data through publications: The BiCIKL approach” in the Biodiversity Data Journal, demonstrating advanced publishing methods of linked biodiversity data, so that they can be easily harvested, distributed and re-used to generate new knowledge. 

BiCIKL is dedicated to building a new community of key research infrastructures, researchers and citizen scientists by using linked FAIR biodiversity data at all stages of the research lifecycle, from specimens through sequencing, imaging, identification of taxa, etc. to final publication in novel, re-usable, human-readable and machine-interpretable scholarly articles.

Achieving a culture change in how biodiversity data are being identified, linked, integrated and re-used is the mission of the BiCIKL consortium. By doing so, BiCIKL is to help increase the transparency, trustworthiness and efficiency of the entire research ecosystem.


The new article collection welcomes taxonomic and other biodiversity-related research articles, data papers, software descriptions, and methodological/theoretical papers. These should demonstrate the advantages and novel approaches in accessing and (re-)using linked biodiversity data.

To be eligible for the collection, a manuscript must comply with at least two of the conditions listed below. In the submission form, the author needs to specify the condition(s) applicable to the manuscript. The author should provide the explanation in a cover letter, using the Notes to the editor field.

All submissions must abide by the community-agreed standards for terms, ontologies and vocabularies used in biodiversity informatics. 

The data used in the articles must comply with the Data Quality Checklist and Fair Data Checklist available in the Authors’ instructions of the journal.


Conditions for publication in the article collection:

  • The authors are expected to use explicit Globally Unique Persistent and Resolvable Identifiers (GUPRI) or other persistent identifiers (PIDs), where such are available, for the different types of data they use and/or cite in the manuscripts (specimens IDs, sequence accession numbers, taxon name and taxon treatment IDs, image IDs, etc.)

  • Global taxon reviews in the form of “cyber-catalogues” are welcome if they contain links of the key data elements (specimens, sequences, taxon treatments, images, literature references, etc.) to their respective records in external repositories. Taxon names in the text should not be hyperlinked. Instead, under each taxon name in the catalogue, the authors should add external links to, for example, Catalogue of Life, nomenclators (e.g. IPNI, MycoBank, Index Fungorum, ZooBank), taxon treatments in Plazi’s TreatmentBank or other relevant trusted resources.

  • Taxonomic papers (e.g. descriptions of new species or revisions) must contain persistent identifiers for the holotype, paratypes and at least most of the specimens used in the study.

  • Specimen records that are used for new taxon descriptions or taxonomic revisions and are associated with a particular Barcode Identification Number (BIN) or Species Hypothesis (SH) should be imported directly from BOLD or PlutoF, respectively, via the ARPHA Writing Tool data-import plugin.

  • More generally, individual specimen records used for various purposes in taxonomic descriptions and inventories should be imported directly into the manuscript from GBIF, iDigBio, or BOLD via the ARPHA Writing Tool data-import plugin. 

  • In-text citations of taxon treatments from Plazi’s TreatmentBank are highly welcome in any taxonomic revision or catalogue. The in-text citations should be hyperlinked to the original treatment data at TreatmentBank.

  • Hyperlinking other terms of importance in the article text to their original external data sources or external vocabularies is encouraged.

  • Tables that list gene accession numbers, specimens and taxon names, should conform to the Biodiversity Data Journal’s linked data tables guidelines.

  • Theoretical or methodological papers on linking FAIR biodiversity data are eligible for the BiCIKL collection if they provide real examples and use cases.

  • Data papers or software descriptions are eligible if they use linked data from the BiCIKL’s partnering research infrastructures, or describe tools and services that facilitate access to and linking between FAIR biodiversity data.

  • Articles that contain nanopublications created or added during the authoring process in Biodiversity Data Journal. A nanopublication is a scientifically meaningful assertion about anything that can be uniquely identified and attributed to its author and serve to communicate a single statement, for example biotic relationship between taxa, or habitat preference of a taxon. The in-built workflow ensures the linkage and its persistence, while the information is simultaneously human-readable and machine-interpretable.
  • Manuscripts that contain or describe any other novel idea or feature related to linked or semantically enhanced biodiversity data will be considered too.

We recommend authors to get acquainted with these two papers before they decide to submit a manuscript to the collection: 


Here are several examples of research questions that might be explored using semantically enriched and linked biodiversity data: 

(1) How does linking taxon names or Operational Taxonomic Units (OTUs) to related external data (e.g. specimen records, sequences, distributions, ecological & bionomic traits, images) contribute to a better understanding of the functions and regional/local processes within faunas/floras/mycotas or biotic communities?

(2) How could the production and publication of taxon descriptions and inventories – including those based mostly on genomic and barcoding data – be streamlined? 

(3) How could general conclusions, assertions and citations in biodiversity articles be expressed in formal, machine-actionable language, either to update prior work or express new facts (e.g. via nanopublications)? 

(4) How could research data and narratives be re-used to support more extensive and data-rich studies? 

(5) Are there other taxon- or topic-specific research questions that would benefit from richer, semantically enhanced FAIR biodiversity data?


All manuscripts submitted to the Biodiversity Data Journal have their data audited by data scientists prior to the peer review stage.

Once published, specimen records data are being exported in Darwin Core Archive to GBIF.

The data and taxon treatments are also exported to several additional data aggregators, such as TreatmentBank, the Biodiversity Literature Repository, and SiBILS amongst others. The full-text articles are also converted to Linked Open Data indexed in the OpenBiodiv Knowledge Graph.


All articles will need to acknowledge the BiCIKL project, Grant No 101007492 in the Acknowledgements section.

* The publication fee (APC) is waived for standard-sized manuscripts (up to 40,000 characters, including spaces) normally charged by BDJ at € 650. Authors of larger manuscripts will need to cover the surplus charge (€10 for each 1,000 characters above 40,000). See more about the APC policy at Biodiversity Data Journal, or contact the journal editorial team at: bdj@pensoft.net.

Follow the BiCIKL Project on Twitter and Facebook. Join the conservation on via #BiCIKL_H2020.

You can also follow Biodiversity Data Journal on Twitter and Facebook.

Data checking for biodiversity collections and other biodiversity data compilers from Pensoft

Guest blog post by Dr Robert Mesibov

Proofreading the text of scientific papers isn’t hard, although it can be tedious. Are all the words spelled correctly? Is all the punctuation correct and in the right place? Is the writing clear and concise, with correct grammar? Are all the cited references listed in the References section, and vice-versa? Are the figure and table citations correct?

Proofreading of text is usually done first by the reviewers, and then finished by the editors and copy editors employed by scientific publishers. A similar kind of proofreading is also done with the small tables of data found in scientific papers, mainly by reviewers familiar with the management and analysis of the data concerned.

But what about proofreading the big volumes of data that are common in biodiversity informatics? Tables with tens or hundreds of thousands of rows and dozens of columns? Who does the proofreading?

Sadly, the answer is usually “No one”. Proofreading large amounts of data isn’t easy and requires special skills and digital tools. The people who compile biodiversity data often lack the skills, the software or the time to properly check what they’ve compiled.

The result is that a great deal of the data made available through biodiversity projects like GBIF is — to be charitable — “messy”. Biodiversity data often needs a lot of patient cleaning by end-users before it’s ready for analysis. To assist end-users, GBIF and other aggregators attach “flags” to each record in the database where an automated check has found a problem. These checks find the most obvious problems amongst the many possible data compilation errors. End-users often have much more work to do after the flags have been dealt with.

In 2017, Pensoft employed a data specialist to proofread the online datasets that are referenced in manuscripts submitted to Pensoft’s journals as data papers. The results of the data-checking are sent to the data paper’s authors, who then edit the datasets. This process has substantially improved many datasets (including those already made available through GBIF) and made them more suitable for digital re-use. At blog publication time, more than 200 datasets have been checked in this way.

Note that a Pensoft data audit does not check the accuracy of the data, for example, whether the authority for a species name is correct, or whether the latitude/longitude for a collecting locality agrees with the verbal description of that locality. For a more or less complete list of what does get checked, see the Data checklist at the bottom of this blog post. These checks are aimed at ensuring that datasets are correctly organised, consistently formatted and easy to move from one digital application to another. The next reader of a digital dataset is likely to be a computer program, not a human. It is essential that the data are structured and formatted, so that they are easily processed by that program and by other programs in the pipeline between the data compiler and the next human user of the data.

Pensoft’s data-checking workflow was previously offered only to authors of data paper manuscripts. It is now available to data compilers generally, with three levels of service:

  • Basic: the compiler gets a detailed report on what needs fixing
  • Standard: minor problems are fixed in the dataset and reported
  • Premium: all detected problems are fixed in collaboration with the data compiler and a report is provided

Because datasets vary so much in size and content, it is not possible to set a price in advance for basic, standard and premium data-checking. To get a quote for a dataset, send an email with a small sample of the data topublishing@pensoft.net.


Data checklist

Minor problems:

  • dataset not UTF-8 encoded
  • blank or broken records
  • characters other than letters, numbers, punctuation and plain whitespace
  • more than one version (the simplest or most correct one) for each character
  • unnecessary whitespace
  • Windows carriage returns (retained if required)
  • encoding errors (e.g. “Dum?ril” instead of “Duméril”)
  • missing data with a variety of representations (blank, “-“, “NA”, “?” etc)

Major problems:

  • unintended shifts of data items between fields
  • incorrect or inconsistent formatting of data items (e.g. dates)
  • different representations of the same data item (pseudo-duplication)
  • for Darwin Core datasets, incorrect use of Darwin Core fields
  • data items that are invalid or inappropriate for a field
  • data items that should be split between fields
  • data items referring to unexplained entities (e.g. “habitat is type A”)
  • truncated data items
  • disagreements between fields within a record
  • missing, but expected, data items
  • incorrectly associated data items (e.g. two country codes for the same country)
  • duplicate records, or partial duplicate records where not needed

For details of the methods used, see the author’s online resources:

***

Find more for Pensoft’s data audit workflow provided for data papers submitted to Pensoft journals on Pensoft’s blog.

Novel research on African bats pilots new ways in sharing and linking published data

A colony of what is apparently a new species of the genus Hipposideros found in an abandoned gold mine in Western Kenya
Photo by B. D. Patterson / Field Museum

Newly published findings about the phylogenetics and systematics of some previously known, but also other yet to be identified species of Old World Leaf-nosed bats, provide the first contribution to a recently launched collection of research articles, whose task is to help scientists from across disciplines to better understand potential hosts and vectors of zoonotic diseases, such as the Coronavirus. Bats and pangolins are among the animals already identified to be particularly potent vehicles of life-threatening viruses, including the infamous SARS-CoV-2.

The article, publicly available in the peer-reviewed scholarly journal ZooKeys, also pilots a new generation of Linked Open Data (LOD) publishing practices, invented and implemented to facilitate ongoing scientific collaborations in times of urgency like those we experience today with the COVID-19 pandemic currently ravaging across over 230 countries around the globe.

In their study, an international team of scientists, led by Dr Bruce PattersonField Museum‘s MacArthur curator of mammals, point to the existence of numerous, yet to be described species of leaf-nosed bats inhabiting the biodiversity hotspots of East Africa and Southeast Asia. In order to expedite future discoveries about the identity, biology and ecology of those bats, they provide key insights into the genetics and relations within their higher groupings, as well as further information about their geographic distribution.

“Leaf-nosed bats carry coronaviruses–not the strain that’s affecting humans right now, but this is certainly not the last time a virus will be transmitted from a wild mammal to humans. If we have better knowledge of what these bats are, we’ll be better prepared if that happens,”

says Dr Terrence Demos, a post-doctoral researcher in Patterson’s lab and a principal author of the paper.
One of the possibly three new to science bat species, previously referred to as Hipposideros caffer or Sundevall’s leaf-nosed bat
Photo by B. D. Patterson / Field Museum

“With COVID-19, we have a virus that’s running amok in the human population. It originated in a horseshoe bat in China. There are 25 or 30 species of horseshoe bats in China, and no one can determine which one was involved. We owe it to ourselves to learn more about them and their relatives,”

comments Patterson.

In order to ensure that scientists from across disciplines, including biologists, but also virologists and epidemiologists, in addition to health and policy officials and decision-makers have the scientific data and evidence at hand, Patterson and his team supplemented their research publication with a particularly valuable appendix table. There, in a conveniently organized table format, everyone can access fundamental raw genetic data about each studied specimen, as well as its precise identification, origin and the natural history collection it is preserved. However, what makes those data particularly useful for researchers looking to make ground-breaking and potentially life-saving discoveries is that all that information is linked to other types of data stored at various databases and repositories contributed by scientists from anywhere in the world.

Furthermore, in this case, those linked and publicly available data or Linked Open Data (LOD) are published in specific code languages, so that they are “understandable” for computers. Thus, when a researcher seeks to access data associated with a particular specimen he/she finds in the table, he/she can immediately access additional data stored at external data repositories by means of a single algorithm. Alternatively, another researcher might want to retrieve all pathogens extracted from tissues from specimens of a specific animal species or from particular populations inhabiting a certain geographical range and so on.

###

The data publication and dissemination approach piloted in this new study was elaborated by the science publisher and technology provider Pensoft and the digitisation company Plazi for the purposes of a special collection of research papers reporting on novel findings concerning the biology of bats and pangolins in the scholarly journal ZooKeys. By targeting the two most likely ‘culprits’ at the roots of the Coronavirus outbreak in 2020: bats and pangolins, the article collection aligns with the agenda of the COVID-19 Joint Task Force, a recent call for contributions made by the Consortium of European Taxonomic Facilities (CETAF), the Distributed System for Scientific Collections (DiSSCo) and the Integrated Digitized Biocollections (iDigBio).

###

Original source:

Patterson BD, Webala PW, Lavery TH, Agwanda BR, Goodman SM, Kerbis Peterhans JC, Demos TC (2020) Evolutionary relationships and population genetics of the Afrotropical leaf-nosed bats (Chiroptera, Hipposideridae). ZooKeys 929: 117-161. https://doi.org/10.3897/zookeys.929.50240

Data mining applied to scholarly publications to finally reveal Earth’s biodiversity

At a time when a million species are at risk of extinction, according to a recent UN report, ironically, we don’t know how many species there are on Earth, nor have we noted down all those that we have come to know on a single list. In fact, we don’t even know how many species we would have put on such a list.

The combined research including over 2,000 natural history institutions worldwide, produced an estimated ~500 million pages of scholarly publications and tens of millions of illustrations and species descriptions, comprising all we currently know about the diversity of life. However, most of it isn’t digitally accessible. Even if it were digital, our current publishing systems wouldn’t be able to keep up, given that there are about 50 species described as new to science every day, with all of these published in plain text and PDF format, where the data cannot be mined by machines, thereby requiring a human to extract them. Furthermore, those publications would often appear in subscription (closed access) journals.

The Biodiversity Literature Repository (BLR), a joint project ofPlaziPensoft and Zenodo at CERN, takes on the challenge to open up the access to the data trapped in scientific publications, and find out how many species we know so far, what are their most important characteristics (also referred to as descriptions or taxonomic treatments), and how they look on various images. To do so, BLR uses highly standardised formats and terminology, typical for scientific publications, to discover and extract data from text written primarily for human consumption.

By relying on state-of-the-art data mining algorithms, BLR allows for the detection, extraction and enrichment of data, including DNA sequences, specimen collecting data or related descriptions, as well as providing implicit links to their sources: collections, repositories etc. As a result, BLR is the world’s largest public domain database of taxonomic treatments, images and associated original publications.

Once the data are available, they are immediately distributed to global biodiversity platforms, such as GBIF–the Global Biodiversity Information Facility. As of now, there are about 42,000 species, whose original scientific descriptions are only accessible because of BLR.

The very basic principle in science to cite previous information allows us to trace back the history of a particular species, to understand how the knowledge about it grew over time, and even whether and how its name has changed through the years. As a result, this service is one avenue to uncover the catalogue of life by means of simple lookups.

So far, the lessons learned have led to the development of TaxPub, an extension of the United States National Library of Medicine Journal Tag Suite and its application in a new class of 26 scientific journals. As a result, the data associated with articles in these journals are machine-accessible from the beginning of the publishing process. Thus, as soon as the paper comes out, the data are automatically added to GBIF.

While BLR is expected to open up millions of scientific illustrations and descriptions, the system is unique in that it makes all the extracted data findable, accessible, interoperable and reusable (FAIR), as well as open to anybody, anywhere, at any time. Most of all, its purpose is to create a novel way to access scientific literature.

To date, BLR has extracted ~350,000 taxonomic treatments and ~200,000 figures from over 38,000 publications. This includes the descriptions of 55,800 new species, 3,744 new genera, and 28 new families. BLR has contributed to the discovery of over 30% of the ~17,000 species described annually.

Prof. Lyubomir Penev, founder and CEO of Pensoft says,

“It is such a great satisfaction to see how the development process of the TaxPub standard, started by Plazi some 15 years ago and implemented as a routine publishing workflow at Pensoft’s journals in 2010, has now resulted in an entire infrastructure that allows automated extraction and distribution of biodiversity data from various journals across the globe. With the recent announcement from the Consortium of European Taxonomic Facilities (CETAF) that their European Journal of Taxonomy is joining the TaxPub club, we are even more confident that we are paving the right way to fully grasping the dimensions of the world’s biodiversity.”

Dr Donat Agosti, co-founder and president of Plazi, adds:

“Finally, information technology allows us to create a comprehensive, extended catalogue of life and bring to light this huge corpus of cultural and scientific heritage – the description of life on Earth – for everybody. The nature of taxonomic treatments as a network of citations and syntheses of what scientists have discovered about a species allows us to link distinct fields such as genomics and taxonomy to specimens in natural history museums.”

Dr Tim Smith, Head of Collaboration, Devices and Applications Group at CERN, comments:

“Moving the focus away from the papers, where concepts are communicated, to the concepts themselves is a hugely significant step. It enables BLR to offer a unique new interconnected view of the species of our world, where the taxonomic treatments, their provenance, histories and their illustrations are all linked, accessible and findable. This is inspirational for the digital liberation of other fields of study!”

###

Additional information:

BLR is a joint project led by Plazi in partnership with Pensoft and Zenodo at CERN.

Currently, BLR is supported by a grant from Arcadia, a charitable fund of Lisbet Rausing and Peter Baldwin.

Dispatch from the field II: Students describe an elusive spider while stationed in Borneo

A mystery has long shrouded the orb-weaving spider genus Opadometa, where males and females belonging to one and the same species look nothing alike. Furthermore, the males appear to be so elusive that scientists still doubt whether both sexes are correctly linked to each other even in the best-known species.

Such is the case for Opadometa sarawakensis – a species known only from female specimens. While remarkable with their striking red and blue colors and large size, the females could not give the slightest hint about the likely appearance of the male Opadometa sarawakensis.

The red and blue female Opadometa sarawakensis
The red and blue female Opadometa sarawakensis

Nevertheless, students taking part in a recent two-week tropical ecology field course organized by the Naturalis Biodiversity Center and Leiden University, and hosted by the Danau Girang Field Centre (DGFC) on the island of Borneo, Malaysia, found a mature male spider hanging on the web of a red and blue female, later identified as Opadometa sarawakensis. Still quite striking, the male was colored in a blend of orange, gray, black, and silver.

At the brink of a long-awaited discovery and eager to describe the male, the students along with their lecturers and the field station scientific staff encountered a peril – with problematic species like the studied orb weaver they were in need for strong evidence to prove that it matched the female from the web. Furthermore, molecular DNA-based analysis was not an option at the time, since the necessary equipment was not available at DGFC.

On the other hand, being at the center of the action turned out to have advantages no less persuasive than DNA evidence. Having conducted thorough field surveys in the area, the team has concluded that the male’s observation on that particular female’s web in addition to the fact that no other Opadometa species were found in the area, was enough to prove they were indeed representatives of the same spider.

Adapting to the quite basic conditions at the DGFC laboratory, the students and their mentors put in use various items they had on hand, including smartphones paired up with headlights mounted on gooseneck clips in place of sophisticated cameras.

In the end, they gathered all the necessary data to prepare the formal description of the newly identified male.

Once they had the observations and the data, there was only one question left to answer. How could they proceed with the submission of a manuscript to a scholarly journal, so that their finding is formally announced and recognised?

submitting

Thanks to the elaborated and highly automated workflow available at the peer-reviewed open access Biodiversity Data Journal and its underlying ARPHA Writing Tool, the researchers managed to successfully compile their manuscript, including all underlying data, such as geolocations, and submit it from the field station. All in all, the authoring, peer review and publication – each step taking place within the ARPHA Platform‘s singular environment – took less than a month to complete. In fact, the paper was published within few days after being submitted.

This is the second publication in the series “Dispatch from the field”, resulting from an initiative led by spider taxonomist Dr Jeremy Miller. In 2014, another team of students and their mentors described a new species of curious one-millimetre-long spider from the Danau Girang Field Center. Both papers serve to showcase the feasibility of publication and sharing of easy to find, access and re-use biodiversity data.

“This has been a unique educational experience for the students,” says Jeremy. “They got to experience how tropical field biologists work, which is often from remote locations and without sophisticated equipment. This means that creativity and persistence are necessary to solve problems and complete a research objective. The fact that the students got to participate in advancing knowledge about this remarkable spider species by contributing to a manuscript was really exciting.”

###

Original source:

Miller J, Freund C, Rambonnet L, Koets L, Barth N, van der Linden C, Geml J, Schilthuizen M, Burger R, Goossens B (2018) Dispatch from the field II: the mystery of the red and blue Opadometa male (Araneae, Tetragnathidae, Opadometa sarawakensis). Biodiversity Data Journal6: e24777. https://doi.org/10.3897/BDJ.6.e24777

Data Quality Checklist and Recommendations at Pensoft

As much as research data sharing and re-usability is a staple in the open science practices, their value would be hugely diminished if their quality is compromised.

In a time when machine-readability and the related software are getting more and more crucial in science, while data are piling up by the minute, it is essential that researchers efficiently format and structure as well as deposit their data, so that they can make it accessible and re-usable for their successors.

Errors, as in data that fail to be read by computer programs, can easily creep into any dataset. These errors are as diverse as invalid characters, missing brackets, blank fields and incomplete geolocations.

To summarise the lessons learnt from our extensive experience in biodiversity data audit at Pensoft, we have now included a Data Quality Checklist and Recommendations page in the About section of each of our data-publishing journals.

We are hopeful that these guidelines will help authors prepare and publish datasets of higher quality, so that their work can be fully utilised in subsequent research.

At the end of the day, proofreading your data is no different than running through your text looking for typos.

 

We would like to use the occasion to express our gratitude to Dr. Robert Mesibov, who prepared the checklist and whose expertise in biodiversity data audit has contributed greatly to Pensoft through the years. Continue reading “Data Quality Checklist and Recommendations at Pensoft”