scientific data

The Biodiversity Data Journal launches its own data portal on GBIF

With this simple website designed to lower technical demands, data managers and other stakeholders can easily focus on data exploration and reuse.

The Biodiversity Data Journal (BDJ) became the second open-access peer-reviewed scholarly title to make use of the hosted portals service provided by the Global Biodiversity Information Facility (GBIF): an international network and data infrastructure aimed at providing anyone, anywhere, open access to data about all types of life on Earth.

The Biodiversity Data Journal portal, hosted on the GBIF platform, is to support biodiversity data use and engagement at national, institutional, regional and thematic scales by facilitating access and reuse of data by users with various expertise in data use and management.

Having piloted the GBIF hosted portal solution with arguably the most revolutionary biodiversity journal in its exclusively open-access scholarly portfolio, Pensoft is to soon replicate the effort with at least 20 other journals in the field. This would mean that the publisher will more than double the number of the currently existing GBIF-hosted portals.

As of the time of writing, the BDJ portal provides seamless access and exploration for nearly 300,000 occurrences of biological organisms from all over the world that have been extracted from the journal’s all-time publications. In addition, the portal provides direct access to more than 800 datasets published alongside papers in BDJ, as well as to almost 1,000 citations of the journal articles associated with those publications.

The release of the BDJ portal should inspire other publishers to follow suit in advancing a more interconnected, open and accessible ecosystem for biodiversity research
Vince Smith

Using the search categories featured in the portal, users can narrow their query by geography, location, taxon, IUCN Global Red List Category, geological context and many others. The dashboard also lets users access multiple statistics about the data, and even explore potentially related records with the help of the clustering feature (e.g. a specimen sequenced by another institution or type material deposited at different institutions). Additionally, the BDJ portal provides basic information about the journal itself and links to the news section from its website.

A video displaying an interactive map with occurrence data on the BDJ portal.

Launched in 2013 with the aim to bring together openly available data and narrative into a peer-reviewed scholarly paper, the Biodiversity Data Journal has remained at the forefront of scholarly publishing in the field of biodiversity research. Over the years, it has been amongst the first to adopt many novelties developed by Pensoft, including the entirely XML-based ARPHA Writing Tool (AWT) that has underpinned the journal’s submission and review process for several years now. Besides the convenience of an entirely online authoring environment, AWT provides multiple integrations with key databases, such as GBIF and BOLD, to allow direct export and import at the authoring stage, thereby further facilitating the publication and dissemination of biodiversity data. More recently, BDJ also piloted the “Nanopublications for Biodiversity” workflow and format as a novel solution to future-proof biodiversity knowledge by sharing “pixels” of machine-actionable scientific statements.

A decade of empowering biodiversity science: celebrating 10 years of Biodiversity Data Journal

“I am thrilled to see the Biodiversity Data Journal’s (BDJ) hosted portal active, ten years since it became the first journal to submit taxon treatments and Darwin Core occurrence records automatically to GBIF! Since its launch in 2013, BDJ has been unrivalled amongst taxonomy and biodiversity journals in its unique workflows that provide authors with import and export functions for structured biodiversity data to/from GBIF, BOLD, iDigBio and more. I am also glad to announce that more than 30 Pensoft biodiversity journals will soon be present as separate hosted portals on GBIF thanks to our long-time collaboration with Plazi, ensuring proper publication, dissemination and re-use of FAIR biodiversity data,” said Prof. Dr. Lyubomir Penev, founder and CEO of Pensoft, and founding editor of BDJ.

“The release of the BDJ portal and subsequent ones planned for other Pensoft journals should inspire other publishers to follow suit in advancing a more interconnected, open and accessible ecosystem for biodiversity research,” said Vince Smith, editor-in-chief of BDJ and head of digital, data and informatics at the Natural History Museum, London.

Data checking for biodiversity collections and other biodiversity data compilers from Pensoft

***Guest blog post by* *Dr Robert Mesibov***

Proofreading the text of scientific papers isn’t hard, although it can be tedious. Are all the words spelled correctly? Is all the punctuation correct and in the right place? Is the writing clear and concise, with correct grammar? Are all the cited references listed in the References section, and vice-versa? Are the figure and table citations correct?

Proofreading of text is usually done first by the reviewers, and then finished by the editors and copy editors employed by scientific publishers. A similar kind of proofreading is also done with the small tables of data found in scientific papers, mainly by reviewers familiar with the management and analysis of the data concerned.

But what about proofreading the big volumes of data that are common in biodiversity informatics? Tables with tens or hundreds of thousands of rows and dozens of columns? Who does the proofreading?

Sadly, the answer is usually “No one”. Proofreading large amounts of data isn’t easy and requires special skills and digital tools. The people who compile biodiversity data often lack the skills, the software or the time to properly check what they’ve compiled.

The result is that a great deal of the data made available through biodiversity projects like GBIF is — to be charitable — “messy”. Biodiversity data often needs a lot of patient cleaning by end-users before it’s ready for analysis. To assist end-users, GBIF and other aggregators attach “flags” to each record in the database where an automated check has found a problem. These checks find the most obvious problems amongst the many possible data compilation errors. End-users often have much more work to do after the flags have been dealt with.

In 2017, Pensoft employed a data specialist to proofread the online datasets that are referenced in manuscripts submitted to Pensoft’s journals as data papers. The results of the data-checking are sent to the data paper’s authors, who then edit the datasets. This process has substantially improved many datasets (including those already made available through GBIF) and made them more suitable for digital re-use. At blog publication time, more than 200 datasets have been checked in this way.

Note that a Pensoft data audit does not check the accuracy of the data, for example, whether the authority for a species name is correct, or whether the latitude/longitude for a collecting locality agrees with the verbal description of that locality. For a more or less complete list of what does get checked, see the Data checklist at the bottom of this blog post. These checks are aimed at ensuring that datasets are correctly organised, consistently formatted and easy to move from one digital application to another. The next reader of a digital dataset is likely to be a computer program, not a human. It is essential that the data are structured and formatted, so that they are easily processed by that program and by other programs in the pipeline between the data compiler and the next human user of the data.

Pensoft’s data-checking workflow was previously offered only to authors of data paper manuscripts. It is now available to data compilers generally, with three levels of service:

Basic: the compiler gets a detailed report on what needs fixing
Standard: minor problems are fixed in the dataset and reported
Premium: all detected problems are fixed in collaboration with the data compiler and a report is provided

Because datasets vary so much in size and content, it is not possible to set a price in advance for basic, standard and premium data-checking. To get a quote for a dataset, send an email with a small sample of the data topublishing@pensoft.net.

—

Data checklist

Minor problems:

dataset not UTF-8 encoded
blank or broken records
characters other than letters, numbers, punctuation and plain whitespace
more than one version (the simplest or most correct one) for each character
unnecessary whitespace
Windows carriage returns (retained if required)
encoding errors (e.g. “Dum?ril” instead of “Duméril”)
missing data with a variety of representations (blank, “-“, “NA”, “?” etc)

Major problems:

unintended shifts of data items between fields
incorrect or inconsistent formatting of data items (e.g. dates)
different representations of the same data item (pseudo-duplication)
for Darwin Core datasets, incorrect use of Darwin Core fields
data items that are invalid or inappropriate for a field
data items that should be split between fields
data items referring to unexplained entities (e.g. “habitat is type A”)
truncated data items
disagreements between fields within a record
missing, but expected, data items
incorrectly associated data items (e.g. two country codes for the same country)
duplicate records, or partial duplicate records where not needed

For details of the methods used, see the author’s online resources:

A Data Cleaner’s Cookbook
BASHing data (a weekly data blog)

***

Find more for Pensoft’s data audit workflow provided for data papers submitted to Pensoft journals on Pensoft’s blog.

FAIR biodiversity data in Pensoft journals thanks to a routine data auditing workflow

Call for data papers from European Russia

Partners GBIF, FinBIF and Pensoft to support publication of data papers that describe datasets from Russia west of the Ural Mountains

Original post via GBIF

GBIF—the Global Biodiversity Information Facility—in collaboration with the Finnish Biodiversity Information Facility (FinBIF) and Pensoft Publishers, are happy to issue a call for authors to submit and publish data papers on European Russia (west of the Urals) in an upcoming special issue of Biodiversity Data Journal (BDJ).

Between now and 31 August 2020, the article processing fee (normally €450) will be waived for the first 20 papers, provided that the publications are accepted and meet the following criteria that the data paper describes a dataset:

The manuscript must be prepared in English and is submitted in accordance with BDJ’s instructions to authors by 31 August 2020. Late submissions will not be eligible for APC waivers.

Sponsorship is limited to the first 20 accepted submissions meeting these criteria on a first-come, first-served basis. The call for submissions can therefore close prior to the stated deadline of 31 August. Authors may contribute to more than one manuscript, but artificial division of the logically uniform data and data stories, or “salami publishing”, is not allowed.

BDJ will publish a special issue including the selected papers by the end of 2020. The journal is indexed by Web of Science (Impact Factor 1.029), Scopus (CiteScore: 1.24) and listed in РИНЦ / eLibrary.ru

For non-native speakers, please ensure that your English is checked either by native speakers or by professional English-language editors prior to submission. You may credit these individuals as a “Contributor” through the AWT interface. Contributors are not listed as co-authors but can help you improve your manuscripts.

In addition to the BDJ instruction to authors, it is required that datasets referenced from the data paper a) cite the dataset’s DOI and b) appear in the paper’s list of references.

Authors should explore the GBIF.org section on data papers and Strategies and guidelines for scholarly publishing of biodiversity data. Manuscripts and datasets will go through a standard peer-review process.

To see an example, view this dataset on GBIF.org and the corresponding data paper published by BDJ.

Questions may be directed either to Dmitry Schigel, GBIF scientific officer, or Yasen Mutafchiev, managing editor of Biodiversity Data Journal.

Definition of terms

Datasets with more than 5,000 records that are new to GBIF.org

Datasets should contain at a minimum 5,000 new records that are new to GBIF.org. While the focus is on additional records for the region, records already published in GBIF may meet the criteria of ‘new’ if they are substantially improved, particularly through the addition of georeferenced locations.

Justification for publishing datasets with fewer records (e.g. sampling-event datasets, sequence-based data, checklists with endemics etc.) will be considered on a case-by-case basis.

Datasets with high-quality data and metadata

Authors should start by publishing a dataset comprised of data and metadata that meets GBIF’s stated data quality requirement. This effort will involve work on an installation of the GBIF Integrated Publishing Toolkit.

Only when the dataset is prepared should authors then turn to working on the manuscript text. The extended metadata you enter in the IPT while describing your dataset can be converted into manuscript with a single-click of a button in the ARPHA Writing Tool (see also Creation and Publication of Data Papers from Ecological Metadata Language (EML) Metadata. Authors can then complete, edit and submit manuscripts to BDJ for review.

Datasets with geographic coverage in European Russia west of the Ural mountains

In correspondence with the funding priorities of this programme, at least 80% of the records in a dataset should have coordinates that fall within the priority area of European Russia west of the Ural mountains. However, authors of the paper may be affiliated with institutions anywhere in the world.

#####

Data audit at Pensoft’s biodiversity journals

Data papers submitted to Biodiversity Data Journal, as well as all relevant biodiversity-themed journals in Pensoft’s portfolio, undergo a mandatory data auditing workflow before being passed down to a subject editor.

Learn more about the workflow here:
https://www.eurekalert.org/pub_releases/2019-10/pp-aif101819.php.

Check out the case study below to see how the data audit workflow works in practice.

CASE STUDY: Data audit for the “Vascular plants dataset of the COFC herbarium (University of Cordoba, Spain)”, a data paper in PhytoKeys

Guest blog post by Dr Robert Mesibov

Partners GBIF, FinBIF and Pensoft to support publication of data papers that describe datasets from Russia west of the Ural Mountains

Original post via GBIF

Definition of terms

Data audit at Pensoft’s biodiversity journals

***Guest blog post by* *Dr Robert Mesibov***