Novel research on African bats pilots new ways in sharing and linking published data

A colony of what is apparently a new species of the genus Hipposideros found in an abandoned gold mine in Western Kenya
Photo by B. D. Patterson / Field Museum

Newly published findings about the phylogenetics and systematics of some previously known, but also other yet to be identified species of Old World Leaf-nosed bats, provide the first contribution to a recently launched collection of research articles, whose task is to help scientists from across disciplines to better understand potential hosts and vectors of zoonotic diseases, such as the Coronavirus. Bats and pangolins are among the animals already identified to be particularly potent vehicles of life-threatening viruses, including the infamous SARS-CoV-2.

The article, publicly available in the peer-reviewed scholarly journal ZooKeys, also pilots a new generation of Linked Open Data (LOD) publishing practices, invented and implemented to facilitate ongoing scientific collaborations in times of urgency like those we experience today with the COVID-19 pandemic currently ravaging across over 230 countries around the globe.

In their study, an international team of scientists, led by Dr Bruce PattersonField Museum‘s MacArthur curator of mammals, point to the existence of numerous, yet to be described species of leaf-nosed bats inhabiting the biodiversity hotspots of East Africa and Southeast Asia. In order to expedite future discoveries about the identity, biology and ecology of those bats, they provide key insights into the genetics and relations within their higher groupings, as well as further information about their geographic distribution.

“Leaf-nosed bats carry coronaviruses–not the strain that’s affecting humans right now, but this is certainly not the last time a virus will be transmitted from a wild mammal to humans. If we have better knowledge of what these bats are, we’ll be better prepared if that happens,”

says Dr Terrence Demos, a post-doctoral researcher in Patterson’s lab and a principal author of the paper.
One of the possibly three new to science bat species, previously referred to as Hipposideros caffer or Sundevall’s leaf-nosed bat
Photo by B. D. Patterson / Field Museum

“With COVID-19, we have a virus that’s running amok in the human population. It originated in a horseshoe bat in China. There are 25 or 30 species of horseshoe bats in China, and no one can determine which one was involved. We owe it to ourselves to learn more about them and their relatives,”

comments Patterson.

In order to ensure that scientists from across disciplines, including biologists, but also virologists and epidemiologists, in addition to health and policy officials and decision-makers have the scientific data and evidence at hand, Patterson and his team supplemented their research publication with a particularly valuable appendix table. There, in a conveniently organized table format, everyone can access fundamental raw genetic data about each studied specimen, as well as its precise identification, origin and the natural history collection it is preserved. However, what makes those data particularly useful for researchers looking to make ground-breaking and potentially life-saving discoveries is that all that information is linked to other types of data stored at various databases and repositories contributed by scientists from anywhere in the world.

Furthermore, in this case, those linked and publicly available data or Linked Open Data (LOD) are published in specific code languages, so that they are “understandable” for computers. Thus, when a researcher seeks to access data associated with a particular specimen he/she finds in the table, he/she can immediately access additional data stored at external data repositories by means of a single algorithm. Alternatively, another researcher might want to retrieve all pathogens extracted from tissues from specimens of a specific animal species or from particular populations inhabiting a certain geographical range and so on.

###

The data publication and dissemination approach piloted in this new study was elaborated by the science publisher and technology provider Pensoft and the digitisation company Plazi for the purposes of a special collection of research papers reporting on novel findings concerning the biology of bats and pangolins in the scholarly journal ZooKeys. By targeting the two most likely ‘culprits’ at the roots of the Coronavirus outbreak in 2020: bats and pangolins, the article collection aligns with the agenda of the COVID-19 Joint Task Force, a recent call for contributions made by the Consortium of European Taxonomic Facilities (CETAF), the Distributed System for Scientific Collections (DiSSCo) and the Integrated Digitized Biocollections (iDigBio).

###

Original source:

Patterson BD, Webala PW, Lavery TH, Agwanda BR, Goodman SM, Kerbis Peterhans JC, Demos TC (2020) Evolutionary relationships and population genetics of the Afrotropical leaf-nosed bats (Chiroptera, Hipposideridae). ZooKeys 929: 117-161. https://doi.org/10.3897/zookeys.929.50240

Plazi and Pensoft join forces to let biodiversity knowledge of coronaviruses hosts out

Pensoft’s flagship journal ZooKeys invites free-to-publish research on key biological traits of SARS-like viruses potential hosts and vectors; Plazi harvests and brings together all relevant data from legacy literature to a reliable FAIR-data repository

To bridge the huge knowledge gaps in the understanding of how and which animal species successfully transmit life-threatening diseases to humans, thereby paving the way for global health emergencies, scholarly publisher Pensoft and literature digitisation provider Plazi join efforts, expertise and high-tech infrastructure. 

By using the advanced text- and data-mining tools and semantic publishing workflows they have developed, the long-standing partners are to rapidly publish easy-to-access and reusable biodiversity research findings and data, related to hosts or vectors of the SARS-CoV-2 or other coronaviruses, in order to provide the stepping stones needed to manage and prevent similar crises in the future.

Already, there’s plenty of evidence pointing to certain animals, including pangolins, bats, snakes and civets, to be the hosts of viruses like SARS-CoV-2 (coronaviruses), hence, potential triggers of global health crises, such as the currently ravaging Coronavirus pandemic. However, scientific research on what biological and behavioural specifics of those species make them particularly successful vectors of zoonotic diseases is surprisingly scarce. Even worse, the little that science ‘knows’ today is often locked behind paywalls and copyright laws, or simply ‘trapped’ in formats inaccessible to text- and data-mining performed by search algorithms. 

This is why Pensoft’s flagship zoological open-access, peer-reviewed scientific journal ZooKeys recently announced its upcoming, special issue, titled “Biology of pangolins and bats”, to invite research papers on relevant biological traits and behavioural features of bats and pangolins, which are or could be making them efficient vectors of zoonotic diseases. Another open-science innovation champion in the Pensoft’s portfolio, Research Ideas and Outcomes (RIO Journal) launched another free-to-publish collection of early and/or brief outcomes of research devoted to SARS-like viruses.

Due to the expedited peer review and publication processes at ZooKeys, the articles will rapidly be made public and accessible to scientists, decision-makers and other experts, who could then build on the findings and eventually come up with effective measures for the prevention and mitigation of future zoonotic epidemics. To further facilitate the availability of such critical research, ZooKeys is waiving the publication charges for accepted papers.

Meanwhile, the literature digitisation provider Plazi is deploying its text- and data-mining expertise and tools, to locate and acquire publications related to hosts of coronaviruses – such as those expected in the upcoming “Biology of pangolins and bats” special issue in ZooKeys – and deposit them in a newly formed Coronavirus-Host Community, a repository hosted on the Zenodo platform. There, all publications will be granted persistent open access and enhanced with taxonomy-specific data derived from their sources. Contributions to Plazi can be made at various levels: from sending suggestions of articles to be added to the Zotero bibliographic public libraries on virus-hosts associations and hosts’ taxonomy, to helping the conversion of those articles into findable, accessible, interoperable and reusable (FAIR) knowledge.

Pensoft’s and Plazi’s collaboration once again aligns with the efforts of the biodiversity community, after the natural science collections consortium DiSSCo (Distributed System of Scientific Collections) and the Consortium of European Taxonomic Facilities (CETAF), recently announced the COVID-19 Task Force with the aim to create a network of taxonomists, collection curators and other experts from around the globe.

FAIR biodiversity data in Pensoft journals thanks to a routine data auditing workflow

Data audit workflow provided for data papers submitted to Pensoft journals.

To avoid publication of openly accessible, yet unusable datasets, fated to result in irreproducible and inoperable biological diversity research at some point down the road, Pensoft takes care for auditing data described in data paper manuscripts upon their submission to applicable journals in the publisher’s portfolio, including Biodiversity Data JournalZooKeysPhytoKeysMycoKeys and many others.

Once the dataset is clean and the paper is published, biodiversity data, such as taxa, occurrence records, observations, specimens and related information, become FAIR (findable, accessible, interoperable and reusable), so that they can be merged, reformatted and incorporated into novel and visionary projects, regardless of whether they are accessed by a human researcher or a data-mining computation.

As part of the pre-review technical evaluation of a data paper submitted to a Pensoft journal, the associated datasets are subjected to data audit meant to identify any issues that could make the data inoperable. This check is conducted regardless of whether the dataset are provided as supplementary material within the data paper manuscript or linked from the Global Biodiversity Information Facility (GBIF) or another external repository. The features that undergo the audit can be found in a data quality checklist made available from the website of each journal alongside key recommendations for submitting authors.

Once the check is complete, the submitting author receives an audit report providing improvement recommendations, similarly to the commentaries he/she would receive following the peer review stage of the data paper. In case there are major issues with the dataset, the data paper can be rejected prior to assignment to a subject editor, but resubmitted after the necessary corrections are applied. At this step, authors who have already published their data via an external repository are also reminded to correct those accordingly.

“It all started back in 2010, when we joined forces with GBIF on a quite advanced idea in the domain of biodiversity: a data paper workflow as a means to recognise both the scientific value of rich metadata and the efforts of the the data collectors and curators. Together we figured that those data could be published most efficiently as citable academic papers,” says Pensoft’s founder and Managing director Prof. Lyubomir Penev.
“From there, with the kind help and support of Dr Robert Mesibov, the concept evolved into a data audit workflow, meant to ‘proofread’ the data in those data papers the way a copy editor would go through the text,” he adds.
“The data auditing we do is not a check on whether a scientific name is properly spelled, or a bibliographic reference is correct, or a locality has the correct latitude and longitude”, explains Dr Mesibov. “Instead, we aim to ensure that there are no broken or duplicated records, disagreements between fields, misuses of the Darwin Core recommendations, or any of the many technical issues, such as character encoding errors, that can be an obstacle to data processing.”

At Pensoft, the publication of openly accessible, easy to access, find, re-use and archive data is seen as a crucial responsibility of researchers aiming to deliver high-quality and viable scientific output intended to stand the test of time and serve the public good.

CASE STUDY: Data audit for the “Vascular plants dataset of the COFC herbarium (University of Cordoba, Spain)”, a data paper in PhytoKeys

To explain how and why biodiversity data should be published in full compliance with the best (open) science practices, the team behind Pensoft and long-year collaborators published a guidelines paper, titled “Strategies and guidelines for scholarly publishing of biodiversity data” in the open science journal Research Ideas and Outcomes (RIO Journal).

Recipe for Reusability: Biodiversity Data Journal integrated with Profeza’s CREDIT Suite

Through their new collaboration, the partners encourage publication of dynamic additional research outcomes to support reusability and reproducibility in science

In a new partnership between open-access Biodiversity Data Journal (BDJ) and workflow software development platform Profeza, authors submitting their research to the scholarly journal will be invited to prepare a Reuse Recipe Document via CREDIT Suite to encourage reusability and reproducibility in science. Once published, their articles will feature a special widget linking to additional research output, such as raw, experimental repetitions, null or negative results, protocols and datasets.

A Reuse Recipe Document is a collection of additional research outputs, which could serve as a guidelines to another researcher trying to reproduce or build on the previously published work. In contrast to a research article, it is a dynamic ‘evolving’ research item, which can be later updated and also tracked back in time, thanks to a revision history feature.

Both the Recipe Document and the Reproducible Links, which connect subsequent outputs to the original publication, are assigned with their own DOIs, so that reuse instances can be easily captured, recognised, tracked and rewarded with increased citability.

With these events appearing on both the original author’s and any reuser’s ORCID, the former can easily gain further credibility for his/her work because of his/her work’s enhanced reproducibility, while the latter increases his/her own by showcasing how he/she has put what he/she has cited into use.

Furthermore, the transparency and interconnectivity between the separate works allow for promoting intra- and inter-disciplinary collaboration between researchers.

“At BDJ, we strongly encourage our authors to use CREDIT Suite to submit any additional research outputs that could help fellow scientists speed up progress in biodiversity knowledge through reproducibility and reusability,” says Prof. Lyubomir Penev, founder of the journal and its scholarly publisher – Pensoft. “Our new partnership with Profeza is in itself a sign that collaboration and integrity in academia is the way to good open science practices.”

“Our partnership with Pensoft is a great step towards gathering crucial feedback and insight concerning reproducibility and continuity in research. This is now possible with Reuse Recipe Documents, which allow for authors and reusers to engage and team up with each other,” says Sheevendra, Co-Founder of Profeza.