Following the submission of their data paper manuscript, which serves to describe the herbarium dataset of vascular plants at the University of Cordoba (Spain), to the open access journal PhytoKeys, Dr Gloria Martínez-Sagarra and Prof Juan Antonio Devesa received a data audit report, prepared by data specialist Dr Robert Mesibov.
As part of the routine workflow, which is mandatory for data papers submitted across relevant Pensoft journals, their work underwent a technical evaluation against a checklist of data quality features, compiled in such a fashion that it ensures uncompromised accessibility, readability and interoperability of the data, regardless of whether its next user is a human or a machine.
To do so, it is crucial that any issues concerning the data structure and format within a dataset – which could potentially cause data loss down the line – need to be identified and addressed prior to the publication of the data paper, in fact, before it is even assigned to a subject editor. Only after the data audit is performed, can a manuscript proceed to peer review. In case there are major issues with the dataset, the data paper can be rejected right away, but resubmitted after the necessary corrections are applied.
In the report, the authors could find a list of identified issues as well as recommendations from Dr Mesibov. Similarly to a conventional peer review, these comments are meant to pinpoint any areas that need to be corrected straight away, as well as those that might only need a bit of further clarification. After receiving the data audit report, the authors take their turn to address the feedback.
In the present case, the report features a list of discrepancies between the counts of taxonomic records as listed in the data paper as opposed to those in the original dataset, i.e. verbatim.txt. Here, as it turned out, the disagreement is due to various taxonomic revisions that have taken place within the highlighted families since the dataset’s last update on GBIF.
In other cases, however, data entry errors, such as inappropriately used fields and non-compliance with the Darwin Core recommendations, had to be cleaned, in order to prevent data loss and compromised interoperability.
With the problematic data corrected, the manuscript proceeded to peer review and was accepted for publication five days later.
Having followed the strong recommendations from Pensoft, the authors also re-uploaded their revised data to GBIF.
As a result, both the data paper and the associated dataset are not only published in an open access, peer-reviewed journal and safely stored at GBIF, but also verified as Findable, Accessible, Interoperable and Reusable.
Thanks to the thorough work and additional efforts of University of Cordoba’s Dr Gloria Martínez-Sagarra and Prof Juan Antonio Devesa, future researchers working on the Andalusian flora can already rely on a real head start.
Find more about the Pensoft’s mandatory data quality workflow in this blog post.