CASE STUDY: Data audit for the “Vascular plants dataset of the COFC herbarium (University of Cordoba, Spain)”, a data paper in PhytoKeys

Following the submission of their data paper manuscript, which serves to describe the herbarium dataset of vascular plants at the University of Cordoba (Spain), to the open access journal PhytoKeys, Dr Gloria Martínez-Sagarra and Prof Juan Antonio Devesa received a data audit report, prepared by data specialist Dr Robert Mesibov

The dataset described in Dr Gloria Martínez-Sagarra’s and Prof Juan Antonio Devesa’s paper is registered and available from the GBIF portal.



As part of the routine workflow, which is mandatory for data papers submitted across relevant Pensoft journals, their work underwent a technical evaluation against a checklist of data quality features, compiled in such a fashion that it ensures uncompromised accessibility, readability and interoperability of the data, regardless of whether its next user is a human or a machine. 

To do so, it is crucial that any issues concerning the data structure and format within a dataset – which could potentially cause data loss down the line – need to be identified and addressed prior to the publication of the data paper, in fact, before it is even assigned to a subject editor. Only after the data audit is performed, can a manuscript proceed to peer review. In case there are major issues with the dataset, the data paper can be rejected right away, but resubmitted after the necessary corrections are applied.

In the report, the authors could find a list of identified issues as well as recommendations from Dr Mesibov. Similarly to a conventional peer review, these comments are meant to pinpoint any areas that need to be corrected straight away, as well as those that might only need a bit of further clarification. After receiving the data audit report, the authors take their turn to address the feedback.

Snapshot from the data auditing report received by Dr Gloria Martínez-Sagarra and Prof Juan Antonio Devesa for their data paper manuscript submitted to PhytoKeys.

In the present case, the report features a list of discrepancies between the counts of taxonomic records as listed in the data paper as opposed to those in the original dataset, i.e. verbatim.txt. Here, as it turned out, the disagreement is due to various taxonomic revisions that have taken place within the highlighted families since the dataset’s last update on GBIF.

Dr Gloria Martínez-Sagarra and Prof Juan Antonio Devesa sent back their comments on the issues addressed in the data audit report.

In other cases, however, data entry errors, such as inappropriately used fields and  non-compliance with the Darwin Core recommendations, had to be cleaned, in order to prevent data loss and compromised interoperability.

With the problematic data corrected, the manuscript proceeded to peer review and was accepted for publication five days later.

Editor’s user interface on the PhytoKeys website showing the progress of the manuscript from submission to acceptance.

Having followed the strong recommendations from Pensoft, the authors also re-uploaded their revised data to GBIF.

Data audit workflow provided for data papers submitted to Pensoft journals.

As a result, both the data paper and the associated dataset are not only published in an open access, peer-reviewed journal and safely stored at GBIF, but also verified as Findable, Accessible, Interoperable and Reusable. 

Thanks to the thorough work and additional efforts of University of Cordoba’s Dr Gloria Martínez-Sagarra and Prof Juan Antonio Devesa, future researchers working on the Andalusian flora can already rely on a real head start.

Find more about the Pensoft’s mandatory data quality workflow in this blog post.

No tags for this post.