Entangled “her”stories – How to create an open multi-linked dynamic dataset of plant genera named for women

Which plant genera do you know that are named for women? Who were/are they?

Guest blog post by Siobhan Leachman, Sabine von Mering, Heather Lindon & Carmen Ulloa Ulloa

How it all began

A post on social media asked about plant genera named for women and sparked a lively discussion with many contributors. This simple question was not as easily answered as initially thought. The resulting informal working group tackled this topic remotely during the COVID-19 pandemic and beyond. The team was motivated by the desire to amplify the contribution of women to botany through eponymy. The work of this team has so far resulted in a paper in Biodiversity Data Journal, presentations at several conferences, and a linked open dataset.

Prior to our international collaboration, no dataset was available to answer these simple questions and the required information was scattered in many different data sources. We set out to bring these data together and in doing so developed and refined our workflow. Our data paper documents this innovative workflow bringing together the various data elements needed to answer our research questions. Ultimately we created a Linked Open Data (LOD) dataset that amplified the names of women and female mythological beings celebrated through generic names of flowering plants.

🙋🏻‍♀️Inspired by the melastome plant genus 𝘔𝘦𝘳𝘪𝘢𝘯𝘪𝘢: which plant genera do you know that honor women? Who were/are they? 🌺 ¿Qué géneros de plantas dedicados a mujeres conoces? ¿Quienes fueron/son? 🧵👇🏽 #WomenInSTEM https://t.co/QWpyaMfihT pic.twitter.com/XBCYD6hmx1
— Carmen Ulloa (@meriania) May 14, 2021

Linking the Data

During our research process we focused on pulling data from a wide variety of sources while at the same time proactively sharing the data generated as widely as possible. This was done by adding and linking it to multiple public databases and sources (push-pull) including the International Plant Name Index (hereafter IPNI), Tropicos®, Wikidata, Bionomia and the Biodiversity Heritage Library (hereafter BHL).

*Visualisation of our workflow to create a working list of flowering plant genera named for women.*

For our list of genera, each of the protologues were reviewed to confirm the etymology or eponymy. To find the generic prologues, we searched botanical databases such as IPNI and Tropicos, openly accessible providers of digital publications and other digital libraries and websites that provide free access to such publications. Here the BHL was invaluable as the majority of protologues and many other relevant publications were openly accessible through this digital library. Where no digital publication was available we accessed scientific literature through our affiliated institutions.

For the women, our starting point was the “Index of Eponymic Plant Names – Extended Edition” by Lotte Burkhardt (2018). We manually extracted all genera honouring women. This dataset was supplemented with other sources including IPNI (2023), Mari Mut (2017-2021), a 2022 updated version of Burkhardt’s document (Burkhardt 2022), as well as suggestions received from colleagues and generated from our own research.

We collected the following information as structured data: information on the woman honoured, the genera named in honour of the woman, the year and place of the protologue or original publication (the nomenclatural reference), the author(s) of the genus name, and the link to the protologue or original publication if available online.

Wikidata

Wikidata was the central data repository and linking mechanism for this project as it provided structured data that can be read and edited by humans and machines and it acts as a hub for other identifiers. As such Wikidata played a central role in semantic linking and enriching of our data.

Wikidata items for the plant genera were created or enriched with information about the name, the author(s) of the genus and the year of publication. Those statements were referenced using the original publication. If the protologue was available on BHL, the BHL bibliographic or page number was added to that reference, thus creating a digital link improving access to the protologue. While undertaking this work we also collated a list of all those public domain publications that appeared to be absent from BHL. We passed on this list to BHL and requested these texts be scanned and added to BHL for the benefit of everyone.

We then added a named after statement to the Wikidata item for the appropriate plant genera linking that item to the Wikidata item for the woman honoured. Wikidata items for the women honoured were newly created or enriched. We researched each person and her contributions, plus information on mythological figures where necessary, and added this information to Wikidata items. Our work also included disambiguating the woman from other people with identical or similar names.

To amplify the women’s contributions to science and to enrich the wider (biodiversity) data ecosystem, we linked to other Wikidata items and websites or databases by adding other relevant identifiers. For example if the women were botanists, botanical collectors or other naturalists, we used the author property to link the women to publications written by them. In addition, we added the women to Bionomia and attributed specimens collected or identified by them to their profiles.

Our work also included enriching Wikidata items of taxon authors. IPNI and Tropicos were searched for these author names, and websites such as BHL, the Global Biodiversity Information Facility (GBIF) or other specialist databases were consulted. Corrections or newly researched information on taxon authors was placed not just in Wikidata but was also sent together with the corresponding references to IPNI and Tropicos. This information was then used by those organizations to update these databases accordingly.

As a result of our data being placed in Wikidata it is available to be queried via the Wikidata Query Service.

Our Goal Achieved

As a result of our project, we published a dataset of 728 genera honouring women or female beings. This was a nearly twenty-fold increase in the number of genera linked to women in Wikidata. Our analysis paper on this data is forthcoming.

Notable Women

Monsonia L.

All of us came away from this research with a favourite story. One that stood out was Ann Monson, for whom Linnaeus named Monsonia. Linnaeus wrote a delightful letter to her about their creating, platonically of course, a kind of plant love-child between them, in the form of this new genus.

Translated from Latin : “….Lock these [seeds] in a pot, and place them in the window of the chamber towards the sun, when it bursts forth in February, and in the first summer the sun blooms and lasts the most beautiful Alstromeria, which no one has seen in England, and you bring forth no flowers. If it should come to pass, as I wish, if you offer our flames, I would only wish to beget with you an only child, as a pledge of my love, little Monsonia, by which you may perpetuate the fame of Lady in the kingdom of Flora, who was the Queen of Women.”

Fittonia Coem.

Two eponymous women with an interesting story are Sarah Mary Fitton and her sister Elizabeth. They wrote Conversations on Botany in 1817 accompanied by colour engravings of flowers which popularised botany with women. The genus Fittonia was named in their honour.

Chanekia Lundell

Another woman honoured in a plant genus was Mercedes Chanek, a Mayan plant collector who worked in the 1930’s for Cyrus Longworth Lundell and collected for the University of Michigan in British Honduras, today Belize. Very little is known about her life and work. However, her collections are detailed in Tropicos and Bionomia, and you can see the genus named for her by Lundell in IPNI under Chanekia.

Medusa Lour. and other genera

An example of a mythological female being honoured in several plant names is that of Medusa, who has the most genera named after her, six, more than any real woman!

We hope that our data paper inspires others to use the methodology and workflow described to create other linked open datasets, e.g. celebrating and amplifying the contributions of underrepresented or marginalised groups in science.

Data paper:

von Mering S, Gardiner LM, Knapp S, Lindon H, Leachman S, Ulloa Ulloa C, Vincent S, Vorontsova MS (2023) Creating a multi-linked dynamic dataset: a case study of plant genera named for women. Biodiversity Data Journal 11: e114408. https://doi.org/10.3897/BDJ.11.e114408

Maximising the impact of standardised biodiversity data: Pensoft’s role in the EU project B-Cubed

In line with its commitment to providing open-access biodiversity data, Pensoft has joined forces with 12 organisations to form the B-Cubed project.

The problem at hand

Measuring the extent and dynamics of the global biodiversity crisis is a challenging task that demands rapid, reliable and repeatable biodiversity monitoring data. Such data is essential for policymakers to be able to assess policy options effectively and accurately. To achieve this, however, there is a need to enhance the integration of biodiversity data from various sources, including citizen scientists, museums, herbaria, and researchers.

B-Cubed’s response

B-Cubed (Biodiversity Building Blocks for policy) hopes to tackle this challenge by reimagining the process of biodiversity monitoring, making it more adaptable and responsive.

B-Cubed’s approach rests on six pillars:

Improved alignment between policy and biodiversity data. Working closely with existing biodiversity initiatives to identify and meet policy needs.
Evidence base. Leveraging data cubes to standardise access to biodiversity data using the Essential Biodiversity Variables framework. These cubes are the basis for models and indicators of biodiversity.
Cloud computing environment. Providing users with access to the models in real-time and on demand.
Automated workflows. Developing exemplary automated workflows for modelling using biodiversity data cubes and for calculating change indicators.
Case studies. Demonstrating the effectiveness of B-Cubed’s tools.
Capacity building. Ensuring that the solutions meet openness standards and training end-users to employ them.

Pensoft’s role

Harnessing its experience in the communication, dissemination and exploitation of numerous EU projects, Pensoft focuses on maximising B-Cubed’s impact and ensuring the adoption and long-term legacy of its results. This encompasses a wide array of activities, ranging all the way from building the project’s visual and online presence to translating its results into policy recommendations. Pensoft also oversees B-Cubed’s data management by developing a Data Management Plan which ensures the implementation of the FAIR data principles and maximises the access to and re-use of the project’s research outputs.

Full list of partners

Visit B-Cubed’s website at https://b-cubed.eu/. You can also follow the project on X @BCubedProject and LinkedIn /B-Cubed Project, as well as by subscribing to its newsletter here.

Data checking for biodiversity collections and other biodiversity data compilers from Pensoft

***Guest blog post by* *Dr Robert Mesibov***

Proofreading the text of scientific papers isn’t hard, although it can be tedious. Are all the words spelled correctly? Is all the punctuation correct and in the right place? Is the writing clear and concise, with correct grammar? Are all the cited references listed in the References section, and vice-versa? Are the figure and table citations correct?

Proofreading of text is usually done first by the reviewers, and then finished by the editors and copy editors employed by scientific publishers. A similar kind of proofreading is also done with the small tables of data found in scientific papers, mainly by reviewers familiar with the management and analysis of the data concerned.

But what about proofreading the big volumes of data that are common in biodiversity informatics? Tables with tens or hundreds of thousands of rows and dozens of columns? Who does the proofreading?

Sadly, the answer is usually “No one”. Proofreading large amounts of data isn’t easy and requires special skills and digital tools. The people who compile biodiversity data often lack the skills, the software or the time to properly check what they’ve compiled.

The result is that a great deal of the data made available through biodiversity projects like GBIF is — to be charitable — “messy”. Biodiversity data often needs a lot of patient cleaning by end-users before it’s ready for analysis. To assist end-users, GBIF and other aggregators attach “flags” to each record in the database where an automated check has found a problem. These checks find the most obvious problems amongst the many possible data compilation errors. End-users often have much more work to do after the flags have been dealt with.

In 2017, Pensoft employed a data specialist to proofread the online datasets that are referenced in manuscripts submitted to Pensoft’s journals as data papers. The results of the data-checking are sent to the data paper’s authors, who then edit the datasets. This process has substantially improved many datasets (including those already made available through GBIF) and made them more suitable for digital re-use. At blog publication time, more than 200 datasets have been checked in this way.

Note that a Pensoft data audit does not check the accuracy of the data, for example, whether the authority for a species name is correct, or whether the latitude/longitude for a collecting locality agrees with the verbal description of that locality. For a more or less complete list of what does get checked, see the Data checklist at the bottom of this blog post. These checks are aimed at ensuring that datasets are correctly organised, consistently formatted and easy to move from one digital application to another. The next reader of a digital dataset is likely to be a computer program, not a human. It is essential that the data are structured and formatted, so that they are easily processed by that program and by other programs in the pipeline between the data compiler and the next human user of the data.

Pensoft’s data-checking workflow was previously offered only to authors of data paper manuscripts. It is now available to data compilers generally, with three levels of service:

Basic: the compiler gets a detailed report on what needs fixing
Standard: minor problems are fixed in the dataset and reported
Premium: all detected problems are fixed in collaboration with the data compiler and a report is provided

Because datasets vary so much in size and content, it is not possible to set a price in advance for basic, standard and premium data-checking. To get a quote for a dataset, send an email with a small sample of the data topublishing@pensoft.net.

—

Data checklist

Minor problems:

dataset not UTF-8 encoded
blank or broken records
characters other than letters, numbers, punctuation and plain whitespace
more than one version (the simplest or most correct one) for each character
unnecessary whitespace
Windows carriage returns (retained if required)
encoding errors (e.g. “Dum?ril” instead of “Duméril”)
missing data with a variety of representations (blank, “-“, “NA”, “?” etc)

Major problems:

unintended shifts of data items between fields
incorrect or inconsistent formatting of data items (e.g. dates)
different representations of the same data item (pseudo-duplication)
for Darwin Core datasets, incorrect use of Darwin Core fields
data items that are invalid or inappropriate for a field
data items that should be split between fields
data items referring to unexplained entities (e.g. “habitat is type A”)
truncated data items
disagreements between fields within a record
missing, but expected, data items
incorrectly associated data items (e.g. two country codes for the same country)
duplicate records, or partial duplicate records where not needed

For details of the methods used, see the author’s online resources:

A Data Cleaner’s Cookbook
BASHing data (a weekly data blog)

***

Find more for Pensoft’s data audit workflow provided for data papers submitted to Pensoft journals on Pensoft’s blog.

FAIR biodiversity data in Pensoft journals thanks to a routine data auditing workflow

Streamlined import of omics metadata from the European Nucleotide Archive (ENA) into an OMICS Data Paper manuscript

Pensoft creates a specialised data paper article type for the omics community within Biodiversity Data Journal to reflect the specific nature of omics data. The scholarly publisher and technology provider established a manuscript template to help standardise the description of such datasets and their most important features.

By Mariya Dimitrova, Raïssa Meyer, Pier Luigi Buttigieg, Lyubomir Penev

Data papers are scientific papers which describe a dataset rather than present and discuss research results. The concept was introduced to the biodiversity community by Chavan and Penev in 2011 as the result of a joint project of GBIF and Pensoft.

Since then, Pensoft has implemented the data paper in several of its journals (Fig. 1). The recognition gained through data papers is an important incentive for researchers and data managers to author better quality metadata and to make it Findable, Accessible, Interoperable and Re-usable (FAIR). High quality and FAIRness of (meta)data are promoted through providing peer review, data audit, permanent scientific record and citation credit as for any other scholarly publication. One can read more on the different types of data papers and how they help to achieve these goals in the Strategies and guidelines for scholarly publishing of biodiversity data (https://doi.org/10.3897/rio.3.e12431).

**Fig. 1** Number of data papers published in Pensoft’s journals since 2011.

The data paper concept was initially based on the standard metadata descriptions, using the Ecological Metadata Language (EML). Apart from distinguishing a specialised place for dataset descriptions by creating a data paper article type, Pensoft has developed multiple workflows for streamlined import of metadata from various repositories and their conversion into data paper a manuscripts in Pensoft’s ARPHA Writing Tool (AWT). You can read more about the EML workflow in this blog post.

Similarly, we decided to create a specialised data paper article type for the omics community within Pensoft’s Biodiversity Data Journal to reflect the specific nature of omics data. We established a manuscript template to help standardise the description of such datasets and their most important features. This initiative was supported in part by the IGNITE project.

How can authors publish omics data papers?

There are two ways to do publish omics data papers – (1) to write a data paper manuscript following the respective template in the ARPHA Writing Tool (AWT) or (2) to convert metadata describing a project or study deposited in EMBL-EBI’s European Nucleotide Archive (ENA) into a manuscript within the AWT.

The first method is straightforward but the second one deserves more attention. We focused on metadata published in ENA, which is part of the International Nucleotide Sequence Database Collaboration (INSDC) and synchronises its records with these of the other two members (DDBJ and NCBI). ENA is linked to the ArrayExpress and BioSamples databases, which describe sequencing experiments and samples, and follow the community-accepted metadata standards MINSEQE and MIxS. To auto populate a manuscript with a click of a button, authors can provide the accession number of the relevant ENA Study of Project and our workflow will automatically retrieve all metadata from ENA, as well as any available ArrayExpress or BioSamples records linked to it (Fig. 2). After that, authors can edit any of the article sections in the manuscript by filling in the relevant template fields or creating new sections, adding text, figures, citations and so on.

An important component of the OMICS data paper manuscript is a supplementary table containing MIxS-compliant metadata imported from BioSamples. When available, BioSamples metadata is automatically converted to a long table format and attached to the manuscript. The authors are not permitted to edit or delete it inside the ARPHA Writing Tool. Instead, if desired, they should correct the associated records in the sourced BioSamples database. We have implemented a feature allowing the automatic re-import of corrected BioSamples records inside the supplementary table. In this way, we ensure data integrity and provide a reliable and trusted source for accessing these metadata.

**Fig. 2** Automated generation of omics data paper manuscripts through import and conversion of metadata associated with the Project ID or Study ID at ENA

Here is a step-by-step guide for conversion of ENA metadata into a data paper manuscript:

The author has published a dataset to any of the INSDC databases. They copy its ENA Study or Project accession number.
The author goes to the Biodiversity Data Journal (BDJ) webpage, clicks the “Start a manuscript” buttоn and selects OMICS Data Paper template in the ARPHA Writing Tool (AWT). Alternatively, the author can also start from the AWT website, click “Create a manuscript”, and select “OMICS Data Paper” as the article type, the Biodiversity Data Journal will be automatically marked by the system. The author clicks the “Import a manuscript” button at the bottom of the webpage.
The author pastes the ENA Study or Project accession number inside the relevant text box (“Import an European Nucleotide Archive (ENA) Study ID or Project ID”) and clicks “Import”.
The Project or Study metadata is converted into an OMICS data paper manuscript along with the metadata from ArrayExpress and BioSamples if available. The author can start making changes to the manuscript, invite co-authors and then submit it for technical evaluation, peer review and publication.

For a detailed description of authoring an OMICS data paper, please refer to the Author Guidelines: https://bdj.pensoft.net/about#OmicsDataPapers

Our innovative workflow makes authoring omics data papers much easier and saves authors time and efforts when inserting metadata into the manuscript. It takes advantage of existing links between data repositories to unify biodiversity and omics knowledge into a single narrative. This workflow demonstrates the importance of standardisation and interoperability to integrate data and metadata from different scientific fields.

We have established a special collection for OMICS data papers in the Biodiversity Data Journal. Authors are invited to describe their omics datasets by using the novel streamlined workflow for creating a manuscript at a click of a button from metadata deposited in ENA or by following the template to create their manuscript via the non-automated route.

To stimulate omics data paper publishing, the first 10 papers will be published free of charge. Upon submission of an omics data paper manuscript, do not forget to assign it to the collection Next-generation publishing of omics data.

Integration of Freshwater Biodiversity Information for Decision-Making in Rwanda

Teams from Ghana, Malawi, Namibia and Rwanda during the inception meeting of the African Biodiversity Challenge Project in Kigali, Rwanda. Photo by Yvette Umurungi.

The establishment and implementation of a long-term strategy for freshwater biodiversity data mobilisation, sharing, processing and reporting in Rwanda is to support environment monitoring and the implementation of Rwanda’s National Biodiversity Strategy (NBSAP). In addition, it is to also help us understand how economic transformation and environmental change is affecting freshwater biodiversity and its resulting ecosystem services.

As part of this strategy, the Center of Excellence in Biodiversity and Natural Resource Management (CoEB) at the University of Rwanda, jointly with the Rwanda Environment Management Authority (REMA) and the Albertine Rift Conservation Society (ARCOS), are implementing the African Biodiversity Challenge (ABC) project “Integration of Freshwater Biodiversity Information for Decision-Making in Rwanda.”

The conference abstract for this project has been published in the open access journal Biodiversity Information Science and Standards (BISS).

The CoEB has a national mandate to lead on biodiversity data mobilisation and implementation of the NBSAP in collaboration with REMA. This includes digitising data from reports, conducting analyses and reporting for policy and research, as indicated in Rwanda’s NBSAP.

The collation of the data will follow the international standards and will be available online, so that they can be accessed and reused from around the world. In fact, CoEB aspires to become a Global Biodiversity Informatics Facility (GBIF) node, thereby strengthening its capacity for biodiversity data mobilisation.

Data use training for the African Biodiversity Challenges at the South African National Biodiversity Institute (SANBI), South Africa. Photo by Yvette Umurungi.

The mobilised data will be organised using GBIF standards, and the project will leverage the tools developed by GBIF to facilitate data publication. Additionally, it will also provide an opportunity for ARCOS to strengthen its collaboration with CoEB as part of its endeavor to establish a regional network for biodiversity data management in the Albertine Rift Region.

The project is expected to conclude with at least six datasets, which will be published through the ARCOS Biodiversity Information System. These are to include three datasets for the Kagera River Basin; one on freshwater macro-invertebrates from the Congo and Nile Basins; one for the Rwanda Development Board archive of research reports from protected areas; and one from thesis reports from master’s and bachelor’s students at the University of Rwanda.

The project will also produce and release the first “Rwandan State of Freshwater Biodiversity”, a document which will describe the status of biodiversity in freshwater ecosystems in Rwanda and present socio-economic conditions affecting human interactions with this biodiversity.

The page of Center of Excellence in Biodiversity and Natural Resource Management (CoEB) at University of Rwanda on the Global Biodiversity Information Facility portal. Image by Yvette Umurungi.

***

The ABC project is a competition coordinated by the South African National Biodiversity Institute (SANBI) and funded by the JRS Biodiversity Foundation. The competition is part of the JRS-funded project, “Mobilising Policy and Decision-making Relevant Biodiversity Data,” and supports the Biodiversity Information Management activities of the GBIF Africa network.

Original source:

Umurungi Y, Kanyamibwa S, Gashakamba F, Kaplin B (2018) African Biodiversity Challenge: Integrating Freshwater Biodiversity Information to Guide Informed Decision-Making in Rwanda. Biodiversity Information Science and Standards 2: e26367. https://doi.org/10.3897/biss.2.26367

Audit finds biodiversity data aggregators ‘lose and confuse’ data

In an effort to improve the quality of biodiversity records, the Atlas of Living Australia (ALA) and the Global Biodiversity Information Facility (GBIF) use automated data processing to check individual data items. The records are provided to the ALA and GBIF by museums, herbaria and other biodiversity data sources.

However, an independent analysis of such records reports that ALA and GBIF data processing also leads to data loss and unjustified changes in scientific names.

The study was carried out by Dr Robert Mesibov, an Australian millipede specialist who also works as a data auditor. Dr Mesibov checked around 800,000 records retrieved from the Australian Museum, Museums Victoria and the New Zealand Arthropod Collection. His results are published in the open access journal ZooKeys, and also archived in a public data repository.

“I was mainly interested in changes made by the aggregators to the genus and species names in the records,” said Dr Mesibov.

“I found that names in up to 1 in 5 records were changed, often because the aggregator couldn’t find the name in the look-up table it used.”

Another worrying result concerned type specimens – the reference specimens upon which scientific names are based. On a number of occasions, the aggregators were found to have replaced the name of a type specimen with a name tied to an entirely different type specimen.

The biggest surprise, according to Dr Mesibov, was the major disagreement on names between aggregators.

“There was very little agreement,” he explained. “One aggregator would change a name and the other wouldn’t, or would change it in a different way.”

Furthermore, dates, names and locality information were sometimes lost from records, mainly due to programming errors in the software used by aggregators to check data items. In some data fields the loss reached 100%, with no original data items surviving the processing.

“The lesson from this audit is that biodiversity data aggregation isn’t harmless,” said Dr Mesibov. “It can lose and confuse perfectly good data.”

“Users of aggregated data should always download both original and processed data items, and should check for data loss or modification, and for replacement of names,” he concluded.

###

Original source:

Mesibov R (2018) An audit of some filtering effects in aggregated occurrence records. ZooKeys 751: 129-146. https://doi.org/10.3897/zookeys.751.24791

Five new Pensoft journals integrated with Dryad to improve data discoverability

Academic publisher Pensoft strengthens partnership with Dryad by adding its latest five journals to the list integrated with the digital repository. From now on, all authors who choose any of the journals published under Pensoft’s imprint will be able to opt for uploading their datasets on Dryad. At the click of a button, the authors will have their data additionally discoverable, reusable, and citable.

Started in 2011 as one of the first ever integrated data deposition workflows between a repository (Dryad) and a publisher (Pensoft), the partnership has now been reinforced to cover publications submitted to any of Pensoft’s 21 journals, including recently launched Research Ideas and Outcomes (RIO) and One Ecosystem, as well as BioDiscovery, African Invertebrates and Zoologia, which all moved to Pensoft within the last year.

By agreeing to deposit their datasets to Dryad, authors take advantage of a specialised and highly acknowledged platform to easily showcase and, hence, take credit for their data. On the other hand, the science community, including educators and students, can readily access the data, facilitating verification, citability and even potential collaborations.

“Dedicated to open and reproducible science, at Pensoft we have always strived to encourage our authors to make their research as transparent and, hence, trustworthy as possible, by providing the right infrastructure and support,” says Pensoft’s founder and CEO Prof. Lyubomir Penev. “By strengthening our long-year partnership with Dryad, I envision more and more authors, who publish in our journals, adding open data to their list of best practices.”

“Dryad works to promote data that are openly available, integrated with the scholarly literature, and routinely re-used to create knowledge,” said Dryad’s Executive Director, Meredith Morovati. “We are encouraged by the growth of our partnership with Pensoft, one of our earliest supporters. We are honored to provide services to Pensoft authors to ensure their data is openly available, linked to the article, and preserved for future use and for the future of science.”

LifeWatchGreece launches a Special Paper Collection for Greek biodiversity research

Developed in the 1990s and early 2000s, LifeWatch is one of the large-scale European Research Infrastructures (ESFRI) created to support biodiversity science and its developments. Its ultimate goal is to model Earth’s biodiversity based on large-scale data, to build a vast network of partners, and to liaise with other high-quality and viable research infrastructures (RI).

Being one of the founding LifeWatch member states, Greece has not only implemented LifeWatchGreece, but it is all set and ready to “fulfill the vision of the Greek LifeWatch RI and establish it as the biodiversity Centre of Excellence for South-eastern Europe”, according to the authors of the latest Biodiversity Data Journal‘s Editorial: Dr Christos Arvanitidis, Dr Eva Chatzinikolaou, Dr Vasilis Gerovasileiou, Emmanouela Panteri, Dr Nicolas Bailly, all affiliated with the Hellenic Centre for Marine Research (HCMR) and part of the LifeWatchGreece Core Team, together with Nikos Minadakis, Foundation for Research and Technology Hellas (FORTH), Alex Hardisty, Cardiff University, and Dr Wouter Los, University of Amsterdam.

Making use of the technologically advanced open access Biodiversity Data Journal and its Collections feature, the LifeWatchGreece team is publishing a vast collection of peer-reviewed scientific outputs, including software descriptions, data papers, taxonomic checklists and research articles, along with the accompanying datasets and supporting material. Their intention is to demonstrate the availability and applicability of the developed e-Services and Virtual Laboratories (vLabs) to both the scientific community, as well as the broader domain of biodiversity management.

The LifeWatchGreece Special Collection is now available in Biodiversity Data Journal, with a series of articles highlighting key contributions to the large-scale European LifeWatch RI. The Software Description papers explain the LifeWatchGreece Portal, where all the e-Services and the vLabs provided by LifeWatchGreece RI are hosted; the Data Services based on semantic web technologies, which provide detailed and specialized search paths to facilitate data mining; the R vLab which can be used for a series of statistical analyses in ecology, based on an integrated and optimized online R environment; and the Micro-CT vLab, which allows the online exploration, dissemination and interactive manipulation of micro-tomography datasets.

The LifeWatchGreece Special Collection also includes a series of taxonomic checklists (preliminary, updated and/or annotated); a series of data papers presenting historical and original datasets; and a selection of research articles reporting on the outcomes, methodologies and citizen science initiatives developed by collaborating research projects, which have shared human, hardware and software resources with LifeWatchGreece RI.

LifeWatchGreece relies on a multidisciplinary approach, involving several subsidiary initiatives; collaborations with Greek, European and World scientific communities; specialised staff, responsible for continuous updates and developments; and, of course, innovative online tools and already established IT infrastructure.

###

Original source:

Arvanitidis C, Chatzinikolaou E, Gerovasileiou V, Panteri E, Bailly N, Minadakis N, Hardisty A, Los W (2016) LifeWatchGreece: Construction and operation of the National Research Infrastructure (ESFRI). Biodiversity Data Journal 4: e10791. https://doi.org/10.3897/BDJ.4.e10791

Additional information:

This work has been supported by the LifeWatchGreece infrastructure (MIS 384676), funded by the Greek Government under the General Secretariat of Research and Technology (GSRT), ESFRI Projects, National Strategic Reference Framework (NSRF).

Streamlining the Import of Specimen or Occurrence Records Into Taxonomic Manuscripts

Repositories and data indexing platforms, such as GBIF, BOLD systems, or iDigBio hold documented specimen or occurrence records along with their record ID’s. In order to streamline the authoring process, save taxonomists’ time, and provide a workflow for peer-review and quality checks of raw occurrence data, the ARPHA team has introduced an innovative feature that makes it possible to easily import specimen occurrence records into a taxonomic manuscript (see Fig. 1).

For the remainder of this post we will refer to specimen data as occurrence records, since an occurrence can be both an observation in the wild, or a museum specimen.

Fig. 1: Workflow for directly importing occurrence records into a taxonomic manuscript.

Until now, when users of the ARPHA writing tool wanted to include occurrence records as materials in a manuscript, they would have had to format the occurrences as an Excel sheet that is uploaded to the Biodiversity Data Journal, or enter the data manually. While the “upload from Excel” approach significantly simplifies the process of importing materials, it still requires a transposition step – the data which is stored in a database needs to be reformatted to the specific Excel format. With the introduction of the new import feature, occurrence data that is stored at GBIF, BOLD systems, or iDigBio, can be directly inserted into the manuscript by simply entering a relevant record identifier.

The functionality shows up when one creates a new “Taxon treatment” in a taxonomic manuscript prepared in the ARPHA Writing Tool. The import functions as follows:

the author locates an occurrence record or records in one of the supported data portals;
the author notes the ID(s) of the records that ought to be imported into the manuscript (see Fig. 2, 3, and 4 for examples);
the author enters the ID(s) of the occurrence records in a form that is to be seen in the materials section of the species treatment, selects a particular database from a list, and then simply clicks ‘Add’ to import the occurrence directly into the manuscript.

In the case of BOLD Systems, the author may also select a given Barcode Identification Number (BIN; for a treatment of BIN’s read below), which then pulls all occurrences in the corresponding BIN (see Fig. 5).

Fig. 2: (Left) An occurrence record in iDigBio. The UUID is highlighted; Fig. 3: (Right) An occurrence record in GBIF. The GBIF ID and the Occurrence ID is highlighted. (Click on images to enlarge)

Fig. 4: (Left) An occurrence record in BOLD Systems. The record ID is highlighted.; Fig. 5: (Right) All occurrence records corresponding to a OTU. The BIN is highlighted. (Click on images to enlarge)

We will illustrate this workflow by creating a fictitious treatment of the red moss, Sphagnum capillifolium, in a test manuscript. Let’s assume we have started a taxonomic manuscript in ARPHA and know that the occurrence records belonging to S. capillifolium can be found in iDigBio. What we need to do is to locate the ID of the occurrence record in the iDigBio webpage. In the case of iDigBio, the ARPHA system supports import via a Universally Unique Identifier (UUID). We have already created a treatment for S. capillifolium and clicked on the pencil to edit materials (Fig. 6). When we scroll all the way down in the pop-up window, we see the form which is displayed in the middle of Fig. 1.

Fig. 6: Edit materials.

From here, the following actions are possible:

insert (an) occurrence record(s) from iDigBio by specifying their UUID’s (universally unique identifier) (Fig.2);
insert (an) occurrence record(s) from GBIF by entering their GBIF ID’s (Fig.3);
insert (an) occurrence record(s) from GBIF by entering their occurrence ID’s (note that unfortunately not all GBIF records have an occurrence ID, which is to be understood as some sort of universal identifier) (Fig. 3);
insert (an) occurrence record(s) from BOLD by entering their record ID’s (Fig. 4);
insert a set of occurrence records from BOLD belonging to a BIN (barcode index number) (Fig. 5).

In this example, select the fifth option (iDigBio) and type or paste the UUID b9ff7774-4a5d-47af-a2ea-bdf3ecc78885 and click Add. This will pull the occurrence record for S. capillifolium from iDigBio and insert it as a material in the current paper (Fig. 6). The same workflow applies also to the aforementioned GBIF and BOLD portals.

Fig. 7: Materials after they have been imported.

This workflow can be used for a number of purposes but one of its most exciting future applications is the rapid re-description of Linnaean species, or new morphological descriptions of species together with DNA barcode sequences (a barcode is a taxon-specific highly conserved gene that provides enough inter-species variation for statistical classification to take place) using the Barcode Identification Numbers (BIN’s) underlying an Operational Taxonomic Units (OTU). If a taxonomist is convinced that a species hypothesis corresponding to OTU defined algorithmically at BOLD systems clearly presents a new species, then he/she can import all specimen records associated with that OTU via inserting that OTU’s BIN ID in the respective fields.

Having imported the specimen occurrence records, the author needs to define one specimen as holotype of the news species, other as paratypes, and so on. The author can also edit the records in the ARPHA tool, delete some, or add new ones, etc.

Not having to retype or copy/paste species occurrence records, the authors save a lot of efforts. Moreover, they automatically import them in a structured Darwin Core format, which can easily be downloaded from the article text into structured data by anyone who needs the data for reuse.

Another important aspect of the workflow is that it will serve as a platform for peer-review, publication and curation of raw data, that is of unpublished individual data records coming from collections or observations stored at GBIF, BOLD and iDigBio. Taxonomists are used to publish only records of specimens they or their co-authors have personally studied. In a sense, the workflow will serve as a “cleaning filter” for portions of data that are passed through the publishing process. Thereafter, the published records can be used to curate raw data at collections, e.g. put correct identifications, assign newly described species names to specimens belonging to the respective BIN and so on.

Additional Information:

The work has been partially supported by the EC-FP7 EU BON project (ENV 308454, Building the European Biodiversity Observation Network) and the ITN Horizon 2020 project BIG4 (Biosystematics, informatics and genomics of the big 4 insect groups: training tomorrow’s researchers and entrepreneurs), under Marie Sklodovska-Curie grant agreement No. 542241.

Novel cybercatalog of flower-loving flies suggests the digital future of taxonomy

Charting Earth’s biodiversity is the goal of taxonomy and to do so the scientists need to create an extensive citation network based on several hundred million pages of scientific literature. By providing a novel taxonomic ‘cybercatalog’ of southern African flower-loving (apiocerid) flies, Drs. Torsten Dikow and Donat Agosti demonstrate how the network of taxonomic knowledge can be made available through links provided to online data providers. Their work is available in the open-access Biodiversity Data Journal.

The present research showcases that the information cannot only be made available to the reader who follows the links, but also to machines that use the growing number of digital, online resources that are linked through persistent identifiers.

Primary data providers for taxonomic information such as species names (ZooBank), specimen images (Morphbank), species descriptions (Plazi), and digitized literature (BHL, Biodiversity Heritage Library; BioStor; and BLR, Biodiversity Literature Repository) play an important role in making data on species available in electronic form. Aggregators such as the Global Biodiversity Information Facility (GBIF) and the Encyclopedia of Life (EoL) gather this information automatically to distribute it even further to audiences beyond the reach of the life sciences.

In contrast to previous species catalogs, in cybercatalogs access to information is provided through links to open-access, online data repositories such as the ones listed above. Taxonomists and other users can now access this literature, species descriptions, and specimen records immediately without a search in a natural history library or collection. The cybercatalog takes advantage of a new publishing platform within the Biodiversity Data Journal that makes it easy to upload species information and links to data about these species through a CheckList template. Furthermore, the Biodiversity Data Journal now allows future updates and re-publications of the cybercatalog with the new unique persistent identifier (DOI, Digital Object Identifier) whenever a new species is described or other taxonomic changes take place.

The authors argue that cybercatalogs are indeed the future of taxonomic catalogs since the online data in them are easily accessible to anyone.

“It is a taxonomist’s dream to have online access to all previously published information on a species and through this step the discipline of taxonomy can (re-)position itself as a central resource within the life sciences and beyond to the public and society at large,” add the authors. “Online access will also help to narrow the gap between the South and the North as a fantastic example of unhindered access to our knowledge of the global biological diversity, which is increasingly under pressure from human populations.”

###

For the realization of this project Plazi and Pensoft were partially supported by the EC-FP7 EU BON project (ENV 30845) (Building the European Biodiversity Observation Network).

###