bioinformatics

Smithsonian’s Dr Torsten Dikow appointed Editor-in-Chief of ZooKeys

Dikow, an esteemed entomologist specialising in Diptera and cybertaxonomy, is the new Editor-in-Chief of the leading scholarly journal in systematic zoology and biodiversity

Esteemed entomologist specialising in true flies (order Diptera) and cybertaxonomy, Dr Torsten Dikow was appointed as the new Editor-in-Chief of the leading open-access peer-reviewed journal in systematic zoology and biodiversity ZooKeys.

Dikow is to step into the shoes of globally celebrated fellow entomologist and colleague at the Smithsonian and founding Editor-in-Chief of ZooKeys Dr Terry Erwin, who sadly passed away in May, 2020, leaving behind hefty scientific legacy and immeasurable admiration and fond memories.

Today, Dikow is a Research Entomologist and Curator of Diptera and Aquatic Insects at the Smithsonian National Museum of Natural History (Washington, DC, USA), where his research interests encompass the diversity and evolutionary history of the superfamily Asiloidea – or asiloid flies – comprising curious insect groups, such as the assassin flies / robber flies and the mydas flies. Amongst an extensive list of research publications, Dikow’s studies on the diversity, biology, distribution and systematics of asiloid flies include the description of 60 species of assassin flies alone, and the redescription of even more through comprehensive taxonomic revisions.

Dikow obtained his M.S. in Zoology from the Universität Rostock (Germany) and Ph.D. in Entomology from Cornell University (New York, USA) with three years of dissertation research conducted at the American Museum of Natural History (AMNH).

During his years as a postdoc at the Field Museum (Illinois, USA), Dikow was earnestly involved in the broader activities of the Encyclopedia of Life through its Biodiversity Synthesis Center (BioSynC) and the Biodiversity Heritage Library (BHL). There, he would personally establish contacts with smaller natural history museums and scientific societies, and encourage them to grant digitisation permissions to the BHL for in-copyright scientific publications. Dikow is a champion of cybertaxonomic tools and making biodiversity data accessible from both natural history collections and publications. He has been named a Biodiversity Open Data Ambassador by the Global Biodiversity Information Facility (GBIF).

Dikow is no stranger to ZooKeys and other journals published by the open-access scientific publisher and technology provider Pensoft. For the past 10 years, he has been amongst the most active editors and a regular author and reviewer at ZooKeys, Biodiversity Data Journal and African Invertebrates.

“Publishing taxonomic revisions and species descriptions in an open-access, innovative journal to make data digitally accessible is one way we taxonomists can and need to add to the biodiversity knowledge base. ZooKeys has been a journal in support of this goal since day one. I am excited to lend my expertise and enthusiasm to further this goal and continue the development to publish foundational biodiversity research, species discoveries, and much more in the zoological field,”

said Dikow.

Dikow took on his new role at ZooKeys at a time when the journal had just turned 15 years on the scholarly publishing scene. In late 2020, the scientific outlet also marked the publication of its 1000^th journal volume.

***

Visit the journal’s website and follow ZooKeys on X (formerly Twitter) and Facebook. You can also follow Torsten Dikow on X.

***

About ZooKeys:

ZooKeys is a peer-reviewed, open-access, rapidly disseminated journal launched to accelerate research and free information exchange in taxonomy, phylogeny, biogeography and evolution of animals. ZooKeys aims to apply the latest trends and methodologies in publishing and preservation of digital materials to meet the highest possible standards of the cybertaxonomy era.

ZooKeys publishes papers in systematic zoology containing taxonomic/faunistic data on any taxon of any geological age from any part of the world with no limit to manuscript size. To respond to the current trends in linking biodiversity information and synthesising the knowledge through technology advancements, ZooKeys also publishes papers across other taxon-based disciplines, such as ecology, molecular biology, genomics, evolutionary biology, palaeontology, behavioural science, bioinformatics, etc.

New way to browse interlinked biodiversity data: Biodiversity Knowledge Hub NOW ONLINE!

The Biodiversity Knowledge Hub is a one-stop portal that allows users to access FAIR and interlinked biodiversity data and services in a few clicks.

The Horizon 2020 BiCIKL Project is proud to announce that the Biodiversity Knowledge Hub (BKH) is now online.

BKH is a one-stop portal that allows users to access FAIR and interlinked biodiversity data and services in a few clicks. BKH was designed to support a new emerging community of users over time and across the entire biodiversity research cycle providing its services to anybody, anywhere and anytime.

The Knowledge Hub is the main product from our BiCIKL consortium, and we are delighted with the result!

BKH can easily be seen as the beginning of the major shift in the way we search interlinked biodiversity information.”

Biodiversity researchers, research infrastructures and publishers interested in fields ranging from taxonomy to ecology and bioinformatics can now freely use BKH as a compass to navigate the oceans of biodiversity data. BKH will do the linkages.
says Prof. Lyubomir Penev, BiCIKL’s Project coordinator and Founder of Pensoft Publishers.

The BKH is designed to serve a new emerging community of users over time and across the entire biodiversity research cycle.

We have invested our best energies and resources in the development of BKH and the Fair Data Place (FDP), which is the beating heart of the portal,”

BKH has been designed to support a new emerging community of users across the entire biodiversity research cycle.

Its purpose goes beyond the BiCIKL project itself: we are thrilled to say that BKH is meant to stay, aiming to reshape the way biodiversity knowledge is accessed and used.
says Dr Christos Arvanitidis, CEO of LifeWatch ERIC.

The BKH outlines how users can navigate and access the linked data, tools and services of the infrastructures cooperating in BiCIKL.

By revealing how they harvest, liberate and reuse data, these increasingly integrated sources enable researchers in the natural sciences to move more seamlessly between specimens and material samples, genomic and metagenomic data, scientific literature, and taxonomic names and units.
said Dr Joe Miller, Executive Secretary of GBIF—the Global Biodiversity Information Facility.

A training programme on how to best utilise the platform is currently being developed by the Consortium of European Taxonomic Facilities (CETAF), Pensoft Publishers, Plazi, Meise Botanic Garden, EMBL’s European Bioinformatics Institute (EMBL-EBI), ELIXIR Hub, GBIF – the Global Biodiversity Information Facility, and LifeWatch ERIC and will be finalised in the coming months.

***

A detailed description of the BKH tools and services provided by its contributing organisations is available at: https://biodiversityknowledgehub.eu.

***

Find more information about the BiCIKL consortium partners on the project’s website.

***

Follow BiCIKL Project on Twitter and Facebook. Join the conversation on Twitter at #BiCIKL_H2020.

New BiCIKL project to build a freeway between pieces of biodiversity knowledge

Within Biodiversity Community Integrated Knowledge Library (BiCIKL), 14 key research and natural history institutions commit to link infrastructures and technologies to provide flawless access to biodiversity data.

In a recently started Horizon 2020-funded project, 14 European institutions from 10 countries, representing both the continent’s and global key players in biodiversity research and natural history, deploy and improve their own and partnering infrastructures to bridge gaps between each other’s biodiversity data types and classes. By linking their technologies, they are set to provide flawless access to data across all stages of the research cycle.

Three years in, BiCIKL (abbreviation for Biodiversity Community Integrated Knowledge Library) will have created the first-of-its-kind Biodiversity Knowledge Hub, where a researcher will be able to retrieve a full set of linked and open biodiversity data, thereby accessing the complete story behind an organism of interest: its name, genetics, occurrences, natural history, as well as authors and publications mentioning any of those.

Ultimately, the project’s products will solidify Open Science and FAIR (Findable, Accessible, Interoperable and Reusable) data practices by empowering and streamlining biodiversity research.

Together, the project partners will redesign the way biodiversity data is found, linked, integrated and re-used across the research cycle. By the end of the project, BiCIKL will provide the community with a more transparent, trustworthy and efficient highly automated research ecosystem, allowing for scientists to access, explore and put into further use a wide range of data with only a few clicks.

“In recent years, we’ve made huge progress on how biodiversity data is located, accessed, shared, extracted and preserved, thanks to a vast array of digital platforms, tools and projects looking after the different types of data, such as natural history specimens, species descriptions, images, occurrence records and genomics data, to name a few. However, we’re still missing an interconnected and user-friendly environment to pull all those pieces of knowledge together. Within BiCIKL, we all agree that it’s only after we puzzle out how to best bridge our existing infrastructures and the information they are continuously sourcing that future researchers will be able to realise their full potential,”
explains BiCIKL’s project coordinator Prof. Lyubomir Penev, CEO and founder of Pensoft, a scholarly publisher and technology provider company.

Continuously fed with data sourced by the partnering institutions and their infrastructures, BiCIKL’s key final output: the Biodiversity Knowledge Hub, is set to persist with time long after the project has concluded. On the contrary, by accelerating biodiversity research that builds on – rather than duplicates – existing knowledge, it will in fact be providing access to exponentially growing contextualised biodiversity data.

***

Learn more about BiCIKL on the project’s website at: bicikl-project.eu

Follow BiCIKL Project on Twitter and Facebook. Join the conversation on Twitter at #BiCIKL_H2020.

***

The project partners:

Pensoft Publishers, Bulgaria
Naturalis Biodiversity Center, Netherlands
Plazi, Switzerland
Meise Botanic Garden, Belgium
European Molecular Biology Laboratory and Elixir/EMBL-EBI, Germany
European Organization for Nuclear Research, Switzerland
Consortium of European Taxonomic Facilities (Belgium) and Muséum national d’Histoire naturelle (MNHN, France, associated party to CETAF)
Institut Suisse de bioinformatique (SIB), Switzerland
Tartu Ülikool (UTartu), Estonia
E-Science European Infrastructure for Biodiversity and Ecosystem Research (LifeWatch), Spain
Freie Universitaet Berlin (FUB-BGBM), Germany
Global Biodiversity Information Facility (GBIF), Denmark
SPECIES 2000 (sp2000) and The Catalogue of Life (COL), United Kingdom
Taxonomic Databases Working Group, today’s Biodiversity Information Standards (TDWG), Netherlands

Data checking for biodiversity collections and other biodiversity data compilers from Pensoft

***Guest blog post by* *Dr Robert Mesibov***

Proofreading the text of scientific papers isn’t hard, although it can be tedious. Are all the words spelled correctly? Is all the punctuation correct and in the right place? Is the writing clear and concise, with correct grammar? Are all the cited references listed in the References section, and vice-versa? Are the figure and table citations correct?

Proofreading of text is usually done first by the reviewers, and then finished by the editors and copy editors employed by scientific publishers. A similar kind of proofreading is also done with the small tables of data found in scientific papers, mainly by reviewers familiar with the management and analysis of the data concerned.

But what about proofreading the big volumes of data that are common in biodiversity informatics? Tables with tens or hundreds of thousands of rows and dozens of columns? Who does the proofreading?

Sadly, the answer is usually “No one”. Proofreading large amounts of data isn’t easy and requires special skills and digital tools. The people who compile biodiversity data often lack the skills, the software or the time to properly check what they’ve compiled.

The result is that a great deal of the data made available through biodiversity projects like GBIF is — to be charitable — “messy”. Biodiversity data often needs a lot of patient cleaning by end-users before it’s ready for analysis. To assist end-users, GBIF and other aggregators attach “flags” to each record in the database where an automated check has found a problem. These checks find the most obvious problems amongst the many possible data compilation errors. End-users often have much more work to do after the flags have been dealt with.

In 2017, Pensoft employed a data specialist to proofread the online datasets that are referenced in manuscripts submitted to Pensoft’s journals as data papers. The results of the data-checking are sent to the data paper’s authors, who then edit the datasets. This process has substantially improved many datasets (including those already made available through GBIF) and made them more suitable for digital re-use. At blog publication time, more than 200 datasets have been checked in this way.

Note that a Pensoft data audit does not check the accuracy of the data, for example, whether the authority for a species name is correct, or whether the latitude/longitude for a collecting locality agrees with the verbal description of that locality. For a more or less complete list of what does get checked, see the Data checklist at the bottom of this blog post. These checks are aimed at ensuring that datasets are correctly organised, consistently formatted and easy to move from one digital application to another. The next reader of a digital dataset is likely to be a computer program, not a human. It is essential that the data are structured and formatted, so that they are easily processed by that program and by other programs in the pipeline between the data compiler and the next human user of the data.

Pensoft’s data-checking workflow was previously offered only to authors of data paper manuscripts. It is now available to data compilers generally, with three levels of service:

Basic: the compiler gets a detailed report on what needs fixing
Standard: minor problems are fixed in the dataset and reported
Premium: all detected problems are fixed in collaboration with the data compiler and a report is provided

Because datasets vary so much in size and content, it is not possible to set a price in advance for basic, standard and premium data-checking. To get a quote for a dataset, send an email with a small sample of the data topublishing@pensoft.net.

—

Data checklist

Minor problems:

dataset not UTF-8 encoded
blank or broken records
characters other than letters, numbers, punctuation and plain whitespace
more than one version (the simplest or most correct one) for each character
unnecessary whitespace
Windows carriage returns (retained if required)
encoding errors (e.g. “Dum?ril” instead of “Duméril”)
missing data with a variety of representations (blank, “-“, “NA”, “?” etc)

Major problems:

unintended shifts of data items between fields
incorrect or inconsistent formatting of data items (e.g. dates)
different representations of the same data item (pseudo-duplication)
for Darwin Core datasets, incorrect use of Darwin Core fields
data items that are invalid or inappropriate for a field
data items that should be split between fields
data items referring to unexplained entities (e.g. “habitat is type A”)
truncated data items
disagreements between fields within a record
missing, but expected, data items
incorrectly associated data items (e.g. two country codes for the same country)
duplicate records, or partial duplicate records where not needed

For details of the methods used, see the author’s online resources:

A Data Cleaner’s Cookbook
BASHing data (a weekly data blog)

***

Find more for Pensoft’s data audit workflow provided for data papers submitted to Pensoft journals on Pensoft’s blog.

FAIR biodiversity data in Pensoft journals thanks to a routine data auditing workflow

The EU not ready for the release of Gene drive organisms into the environment

Gene drive organisms (GDOs) have been suggested as an approach to solve some of the most pressing environmental and public health issues. Currently, it remains unclear what kind of regulations are to be used to cover the potential risks. In their study, published in the open-access journal BioRisk, scientists evaluate the options for an operational risk assessment of GDOs before their release into environments across the EU.

EU scientists are taking a closer look into the CRISPR/Cas-9-induced population-wide genetic modifications before introducing it into practice

Within the last decades, new genetic engineering tools for manipulating genetic material in plants, animals and microorganisms are getting large attention from the international community, bringing new challenges and possibilities. While genetically modified organisms (GMO) have been known and used for quite a while now, gene drive organisms (GDO) are yet at the consideration and evaluation stage.

The difference between these two technologies, where both are meant to replace certain characters in animals or plants with ones that are more favourable for the human population, is that, even though in GDO there is also foreign “synthetic” DNA being introduced, the inheritance mode differs. In GDO, the genome’s original base arrangements are changed, using CRISPR/Cas-9 genome editing. Once the genome is changed, its alterations are carried down the organism’s offspring and subsequent generations.

In their study, published in the open-access journal Biorisk, an international group of scientists led by Marion Dolezel from the Environment Agency Austria, discuss the potential risks and impacts on the environment.

The research team also points to current regulations addressing invasive alien species and biocontrol agents, and finds that the GMO regulations are, in principle, also a useful starting point for GDO.

There are three main areas suggested to benefit from gene drive systems: public health (e.g. vector control of human pathogens), agriculture (e.g. weed and pest control), environmental protection and nature conservation (e.g. control of harmful non-native species).

In recent years, a range of studies have shown the feasibility of synthetic CRISPR-based gene drives in different organisms, such as yeast, the common fruit fly, mosquitoes and partly in mammals.

Given the results of previous research, the gene drive approach can even be used as prevention for some zoonotic diseases and, hence, possible future pandemics. For example, laboratory tests showed that release of genetically modified mosquitoes can drastically reduce the number of malaria vectors. Nevertheless, potential environment and health implications, related to the release of GDO, remain unclear. Only a few potential applications have so far progressed to the research and development stage.

“The potential of GDOs for unlimited spread throughout wild populations, once released, and the apparently inexhaustible possibilities of multiple and rapid modifications of the genome in a vast variety of organisms, including higher organisms such as vertebrates, pose specific challenges for the application of adequate risk assessment methodologies”,
shares the lead researcher Mrs. Dolezel.

In the sense of genetic engineering being a fastly developing science, every novel feature must be taken into account, while preparing evaluations and guidances, and each of them provides extra challenges.

Today, the scientists present three key differences of gene drives compared to the classical GMO:

1. Introducing novel modifications to wild populations instead of “familiar” crop species, which is a major difference between “classic” GMOs and GDOs.

“The goal of gene drive applications is to introduce a permanent change in the ecosystem, either by introducing a phenotypic change or by drastically reducing or eradicating a local population or a species. This is a fundamental difference to GM crops for which each single generation of hybrid seed is genetically modified, released and removed from the environment after a relatively short period”,
shares Dolezel.

2. Intentional and potentially unlimited spread of synthetic genes in wild populations and natural ecosystems.

Gene flow of synthetic genes to wild organisms can have adverse ecological impact on the genetic diversity of the targeted population. It could change the weediness or invasiveness of certain plants, but also threaten with extinction the species in the wild.

3. Possibility for long-term risks to populations and ecosystems.

Key and unique features of GDOs are the potential long-term changes in populations and large-scale spread across generations.

In summary, the research team points out that, most of all, gene drive organisms must be handled extremely carefully, and that the environmental risks related to their release must be assessed under rigorous scrutiny. The standard requirements before the release of GDOs need to also include close post-release monitoring and risk management measures.

It is still hard to assess with certainty the potential risks and impact of gene drive applications on the environment, human and animal health. That’s why highly important questions need to be addressed, and the key one is whether genetically driven organisms are to be deliberately released into the environment in the European Union. The High Level Group of the European Commission’s Scientific Advice Mechanism highlights that within the current regulatory frameworks those risks may not be covered.

The research group recommends the institutions to evaluate whether the regulatory oversight of GMOs in the EU is accomodate to cover the novel risks and challenges posed by gene drive applications.

“The final decision to release GDOs into the environment will, however, not be a purely scientific question, but will need some form of broader stakeholder engagement and the commitment to specific protection goals for human health and the environment”,
concludes Dolezel.

***

Original source:
Dolezel M, Lüthi C, Gaugitsch H (2020) Beyond limits – the pitfalls of global gene drives for environmental risk assessment in the European Union. BioRisk 15: 1-29. https://doi.org/10.3897/biorisk.15.49297

Contact:
Marion Dolezel
Email: marion.dolezel@umweltbundesamt.at

FAIR biodiversity data in Pensoft journals thanks to a routine data auditing workflow

*Data audit workflow provided for data papers submitted to Pensoft journals.*

To avoid publication of openly accessible, yet unusable datasets, fated to result in irreproducible and inoperable biological diversity research at some point down the road, Pensoft takes care for auditing data described in data paper manuscripts upon their submission to applicable journals in the publisher’s portfolio, including Biodiversity Data Journal, ZooKeys, PhytoKeys, MycoKeys and many others.

Once the dataset is clean and the paper is published, biodiversity data, such as taxa, occurrence records, observations, specimens and related information, become FAIR (findable, accessible, interoperable and reusable), so that they can be merged, reformatted and incorporated into novel and visionary projects, regardless of whether they are accessed by a human researcher or a data-mining computation.

As part of the pre-review technical evaluation of a data paper submitted to a Pensoft journal, the associated datasets are subjected to data audit meant to identify any issues that could make the data inoperable. This check is conducted regardless of whether the dataset are provided as supplementary material within the data paper manuscript or linked from the Global Biodiversity Information Facility (GBIF) or another external repository. The features that undergo the audit can be found in a data quality checklist made available from the website of each journal alongside key recommendations for submitting authors.

Once the check is complete, the submitting author receives an audit report providing improvement recommendations, similarly to the commentaries he/she would receive following the peer review stage of the data paper. In case there are major issues with the dataset, the data paper can be rejected prior to assignment to a subject editor, but resubmitted after the necessary corrections are applied. At this step, authors who have already published their data via an external repository are also reminded to correct those accordingly.

“It all started back in 2010, when we joined forces with GBIF on a quite advanced idea in the domain of biodiversity: a data paper workflow as a means to recognise both the scientific value of rich metadata and the efforts of the the data collectors and curators. Together we figured that those data could be published most efficiently as citable academic papers,” says Pensoft’s founder and Managing director Prof. Lyubomir Penev.

“From there, with the kind help and support of Dr Robert Mesibov, the concept evolved into a data audit workflow, meant to ‘proofread’ the data in those data papers the way a copy editor would go through the text,” he adds.

“The data auditing we do is not a check on whether a scientific name is properly spelled, or a bibliographic reference is correct, or a locality has the correct latitude and longitude”, explains Dr Mesibov. “Instead, we aim to ensure that there are no broken or duplicated records, disagreements between fields, misuses of the Darwin Core recommendations, or any of the many technical issues, such as character encoding errors, that can be an obstacle to data processing.”

At Pensoft, the publication of openly accessible, easy to access, find, re-use and archive data is seen as a crucial responsibility of researchers aiming to deliver high-quality and viable scientific output intended to stand the test of time and serve the public good.

CASE STUDY: Data audit for the “Vascular plants dataset of the COFC herbarium (University of Cordoba, Spain)”, a data paper in PhytoKeys

To explain how and why biodiversity data should be published in full compliance with the best (open) science practices, the team behind Pensoft and long-year collaborators published a guidelines paper, titled “Strategies and guidelines for scholarly publishing of biodiversity data” in the open science journal Research Ideas and Outcomes (RIO Journal).

Sir Charles Lyell’s historical fossils kept at London’s Natural History Museum accessible online

The Lyell Project team: First row, seated from left to right: Martha Richter (Principal Curator in Charge of Vertebrates), Consuelo Sendino (with white coat, curator of bryozoans holding a Lyell fossil gastropod from Canaries), Noel Morris (Scientific Associate of Invertebrates), Claire Mellish (Senior Curator of arthropods), Sandra Chapman (curator of reptiles) and Emma Bernard (curator of fishes, holding the lectotype of Cephalaspis lyelli). Second row, standing on from left to right: Jill Darrell (curator of cnidarians), Zoe Hughes (curator of brachiopods) and Kevin Webb (science photographer). Photo by Nelly Perez-Larvor.

More than 1,700 animal and plant specimens from the collection of eminent British geologist Sir Charles Lyell – known as the pioneer of modern geology – were organised, digitised and made openly accessible via the NHM Data Portal in a pilot project, led by Dr Consuelo Sendino, curator at the Department of Earth Sciences (Natural History Museum, London). They are described in a data paper published in the open-access Biodiversity Data Journal.

*Curator of plants Peta Hayes (left) and curator of bryozoans Consuelo Sendino (right) looking at a Lyell fossil plant from Madeira in the collection area. Photo by Mark Lewis.*

The records contain the data from the specimens’ labels (species name, geographical details, geological age and collection details), alongside high-resolution photographs, most of which were ‘stacked’ with the help of specialised software to re-create a 3D model.

Sir Charles Lyell’s fossil collection comprises a total of 1,735 specimens of fossil molluscs, filter-feeding moss animals and fish, as well as 51 more recent shells, including nine specimens originally collected by Charles Darwin from Tierra del Fuego or Galapagos, and later gifted to the geologist. The first specimen of the collection was deposited in distant 1846 by Charles Lyell himself, while the last one – in 1980 by one of his heirs.

With as much as 95% of the specimens having been found at the Macaronesian archipelagos of the Canaries and Madeira and dating to the Cenozoic era, the collection provides a key insight into the volcano formation and palaeontology of Macaronesia and the North Atlantic Ocean. By digitising the collection and making it easy to find and access for researchers from around the globe, the database is to serve as a stepping stone for studies in taxonomy, stratigraphy and volcanology at once.

*Sites where the Earth Sciences’ Lyell Collection specimens originate.*

“The display of this data virtually eliminates the need for specimen handling by researchers and will greatly speed up response time to collection enquiries,” explains Dr Sendino.

Furthermore, the pilot project and its workflow provide an invaluable example to future digitisation initiatives. In her data paper, Dr Sendino lists the limited resources she needed to complete the task in just over a year.

In terms of staff, the curator was joined by MSc student Teresa Máñez (University of Valencia, Spain) for six weeks while locating the specimens and collecting all the information about them; volunteer Jane Barnbrook, who re-boxed 1,500 specimens working one day per week for a year; NHM’s science photographer Kevin Webb and University of Lisbon’s researcher Carlos Góis-Marques, who imaged the specimens; and a research associate, who provided broad identification of the specimens, working one day per week for two months. Each of the curators for the collections, where the Lyell specimens were kept, helped Dr Sendino for less than a day. On the other hand, the additional costs comprised consumables such as plastazote, acid-free trays, archival pens, and archival paper for new labels.

“The success of this was due to advanced planning and resource tracking,” comments Dr Sendino.

“This is a good example of reduced cost for digitisation infrastructure creation maintaining a high public profile for digitisation,” she concludes.

###

Original source:

Sendino C (2019) The Lyell Collection at the Earth Sciences Department, Natural History Museum, London (UK). Biodiversity Data Journal 7: e33504. https://doi.org/10.3897/BDJ.7.e33504

###

About NHM Data Portal:

Committed to open access and open science, the Natural History Museum (London, UK) has launched the Data Portal to make its research and collections datasets available online. It allows anyone to explore, download and reuse the data for their own research.

The portal’s main dataset consists of specimens from the Museum’s collection database, with 4,224,171 records from the Museum’s Palaeontology, Mineralogy, Botany, Entomology and Zoology collections.

Recipe for Reusability: Biodiversity Data Journal integrated with Profeza’s CREDIT Suite

Through their new collaboration, the partners encourage publication of dynamic additional research outcomes to support reusability and reproducibility in science

In a new partnership between open-access Biodiversity Data Journal (BDJ) and workflow software development platform Profeza, authors submitting their research to the scholarly journal will be invited to prepare a Reuse Recipe Document via CREDIT Suite to encourage reusability and reproducibility in science. Once published, their articles will feature a special widget linking to additional research output, such as raw, experimental repetitions, null or negative results, protocols and datasets.

A Reuse Recipe Document is a collection of additional research outputs, which could serve as a guidelines to another researcher trying to reproduce or build on the previously published work. In contrast to a research article, it is a dynamic ‘evolving’ research item, which can be later updated and also tracked back in time, thanks to a revision history feature.

Both the Recipe Document and the Reproducible Links, which connect subsequent outputs to the original publication, are assigned with their own DOIs, so that reuse instances can be easily captured, recognised, tracked and rewarded with increased citability.

With these events appearing on both the original author’s and any reuser’s ORCID, the former can easily gain further credibility for his/her work because of his/her work’s enhanced reproducibility, while the latter increases his/her own by showcasing how he/she has put what he/she has cited into use.

Furthermore, the transparency and interconnectivity between the separate works allow for promoting intra- and inter-disciplinary collaboration between researchers.

“At BDJ, we strongly encourage our authors to use CREDIT Suite to submit any additional research outputs that could help fellow scientists speed up progress in biodiversity knowledge through reproducibility and reusability,” says Prof. Lyubomir Penev, founder of the journal and its scholarly publisher – Pensoft. “Our new partnership with Profeza is in itself a sign that collaboration and integrity in academia is the way to good open science practices.”

“Our partnership with Pensoft is a great step towards gathering crucial feedback and insight concerning reproducibility and continuity in research. This is now possible with Reuse Recipe Documents, which allow for authors and reusers to engage and team up with each other,” says Sheevendra, Co-Founder of Profeza.

Plazi and the Biodiversity Literature Repository (BLR) awarded EUR 1.1 million from Arcadia Fund to grant free access to biodiversity data

Plazi has received a grant of EUR 1.1 million from Arcadia – the charitable fund of Lisbet Rausing and Peter Baldwin – to liberate data, such as taxonomic treatments and images, trapped in scholarly biodiversity publications.

The project will expand the existing corpus of the Biodiversity Literature Repository (BLR), a joint venture of Plazi and Pensoft, hosted on Zenodo at CERN. The project aims to add hundreds of thousands of figures and taxonomic treatments extracted from publications, and further develop and hone the tools to search through the corpus.

The BLR is an open science community platform to make the data contained in scholarly publications findable, accessible, interoperable and reusable (FAIR). BLR is hosted on Zenodo, the open science repository at CERN, and maintained by the Switzerland-based Plazi association and the open access publisher Pensoft.

In its short existence, BLR has already grown to a considerate size: 35,000+ articles have been added, and extracted from 600+ journals. From these articles, more than 180,000 images have also been extracted and uploaded to BLR, and 225,000+ sub-article components, including biological names, taxonomic treatments or equivalent defined blocks of text have been deposited at Plazi’s TreatmentBank. Additionally, over a million bibliographic references have been extracted and added to Refbank.

The articles, images and all other sub-article elements are fully FAIR compliant and citable. In case an article is behind a paywall, a user can still access its underlying metadata, the link to the original article, and use the DOI assigned to it by BLR for persistent citation.

“Generally speaking, scientific illustrations and taxonomic treatments, such as species descriptions, are one of the best kept ‘secrets’ in science as they are neither indexed, nor are they citable or accessible. At best, they are implicitly referenced,” said Donat Agosti, president of Plazi. “Meanwhile, their value is undisputed, as shown by the huge effort to create them in standard, comparative ways. From day one, our project has been an eye-opener and a catalyst for the open science scene,” he concluded.

Though the target scientific domain is biodiversity, the Plazi workflow and tools are open source and can be applied to other domains – being a catalyst is one of the project’s goals.

While access to biodiversity images has already proven useful to scientists, but also inspirational to artists, for example, the people behind Plazi are certain that such a well-documented, machine-readable interface is sure to lead to many more innovative uses.

To promote BLR’s approach to make these important data accessible, Plazi seeks collaborations with the community and publishers, to remove hurdles in liberating the data contained in scholarly publications and make them FAIR.

The robust legal aspects of the project are a core basis of BLR’s operation. By extracting the non-copyrightable elements from the publications and making them findable, accessible and re-usable for free, the initiative drives the move beyond the PDF and HTML formats to structured data.

###

To participate in the project or for further questions, please contact Donat Agosti, President at Plazi at info@plazi.org

Additional information:

About Plazi:

Plazi is an association supporting and promoting the development of persistent and openly accessible digital taxonomic literature. To this end, Plazi maintains TreatmentBank, a digital taxonomic literature repository to enable archiving of taxonomic treatments; develops and maintains TaxPub, an extension of the National Library of Medicine / National Center for Biotechnology Informatics Journal Article Tag Suite for taxonomic treatments; is co-founder of the Biodiversity Literature Repository at Zenodo, participates in the development of new models for publishing taxonomic treatments in order to maximize interoperability with other relevant cyberinfrastructure components such as name servers and biodiversity resources; and advocates and educates about the vital importance of maintaining free and open access to scientific discourse and data. Plazi is a major contributor to the Global Biodiversity Information Facility.

About Arcadia Fund:

Arcadia is a charitable fund of Lisbet Rausing and Peter Baldwin. It supports charities and scholarly institutions that preserve cultural heritage and the environment. Arcadia also supports projects that promote open access and all of its awards are granted on the condition that any materials produced are made available for free online. Since 2002, Arcadia has awarded more than $500 million to projects around the world.

Audit finds biodiversity data aggregators ‘lose and confuse’ data

In an effort to improve the quality of biodiversity records, the Atlas of Living Australia (ALA) and the Global Biodiversity Information Facility (GBIF) use automated data processing to check individual data items. The records are provided to the ALA and GBIF by museums, herbaria and other biodiversity data sources.

However, an independent analysis of such records reports that ALA and GBIF data processing also leads to data loss and unjustified changes in scientific names.

The study was carried out by Dr Robert Mesibov, an Australian millipede specialist who also works as a data auditor. Dr Mesibov checked around 800,000 records retrieved from the Australian Museum, Museums Victoria and the New Zealand Arthropod Collection. His results are published in the open access journal ZooKeys, and also archived in a public data repository.

“I was mainly interested in changes made by the aggregators to the genus and species names in the records,” said Dr Mesibov.

“I found that names in up to 1 in 5 records were changed, often because the aggregator couldn’t find the name in the look-up table it used.”

Another worrying result concerned type specimens – the reference specimens upon which scientific names are based. On a number of occasions, the aggregators were found to have replaced the name of a type specimen with a name tied to an entirely different type specimen.

The biggest surprise, according to Dr Mesibov, was the major disagreement on names between aggregators.

“There was very little agreement,” he explained. “One aggregator would change a name and the other wouldn’t, or would change it in a different way.”

Furthermore, dates, names and locality information were sometimes lost from records, mainly due to programming errors in the software used by aggregators to check data items. In some data fields the loss reached 100%, with no original data items surviving the processing.

“The lesson from this audit is that biodiversity data aggregation isn’t harmless,” said Dr Mesibov. “It can lose and confuse perfectly good data.”

“Users of aggregated data should always download both original and processed data items, and should check for data loss or modification, and for replacement of names,” he concluded.

###