To avoid publication of openly accessible, yet unusable datasets, fated to result in irreproducible and inoperable biological diversity research at some point down the road, Pensoft takes care for auditing data described in data paper manuscripts upon their submission to applicable journals in the publisher’s portfolio, including Biodiversity Data Journal, ZooKeys, PhytoKeys, MycoKeys and many others.
Once the dataset is clean and the paper is published, biodiversity data, such as taxa, occurrence records, observations, specimens and related information, become FAIR (findable, accessible, interoperable and reusable), so that they can be merged, reformatted and incorporated into novel and visionary projects, regardless of whether they are accessed by a human researcher or a data-mining computation.
As part of the pre-review technical evaluation of a data paper submitted to a Pensoft journal, the associated datasets are subjected to data audit meant to identify any issues that could make the data inoperable. This check is conducted regardless of whether the dataset are provided as supplementary material within the data paper manuscript or linked from the Global Biodiversity Information Facility (GBIF) or another external repository. The features that undergo the audit can be found in a data quality checklist made available from the website of each journal alongside key recommendations for submitting authors.
Once the check is complete, the submitting author receives an audit report providing improvement recommendations, similarly to the commentaries he/she would receive following the peer review stage of the data paper. In case there are major issues with the dataset, the data paper can be rejected prior to assignment to a subject editor, but resubmitted after the necessary corrections are applied. At this step, authors who have already published their data via an external repository are also reminded to correct those accordingly.
“It all started back in 2010, when we joined forces with GBIF on a quite advanced idea in the domain of biodiversity: a data paper workflow as a means to recognise both the scientific value of rich metadata and the efforts of the the data collectors and curators. Together we figured that those data could be published most efficiently as citable academic papers,” says Pensoft’s founder and Managing director Prof. Lyubomir Penev.
“From there, with the kind help and support of Dr Robert Mesibov, the concept evolved into a data audit workflow, meant to ‘proofread’ the data in those data papers the way a copy editor would go through the text,” he adds.
“The data auditing we do is not a check on whether a scientific name is properly spelled, or a bibliographic reference is correct, or a locality has the correct latitude and longitude”, explains Dr Mesibov. “Instead, we aim to ensure that there are no broken or duplicated records, disagreements between fields, misuses of the Darwin Core recommendations, or any of the many technical issues, such as character encoding errors, that can be an obstacle to data processing.”
At Pensoft, the publication of openly accessible, easy to access, find, re-use and archive data is seen as a crucial responsibility of researchers aiming to deliver high-quality and viable scientific output intended to stand the test of time and serve the public good.
CASE STUDY: Data audit for the “Vascular plants dataset of the COFC herbarium (University of Cordoba, Spain)”, a data paper in PhytoKeys
To explain how and why biodiversity data should be published in full compliance with the best (open) science practices, the team behind Pensoft and long-year collaborators published a guidelines paper, titled “Strategies and guidelines for scholarly publishing of biodiversity data” in the open science journal Research Ideas and Outcomes (RIO Journal).