Entangled “her”stories – How to create an open multi-linked dynamic dataset of plant genera named for women

Which plant genera do you know that are named for women? Who were/are they?

Guest blog post by  Siobhan Leachman, Sabine von Mering, Heather Lindon & Carmen Ulloa Ulloa

How it all began

A post on social media asked about plant genera named for women and sparked a lively discussion with many contributors. This simple question was not as easily answered as initially thought. The resulting informal working group tackled this topic remotely during the COVID-19 pandemic and beyond. The team was motivated by the desire to amplify the contribution of women to botany through eponymy. The work of this team has so far resulted in a paper in Biodiversity Data Journal, presentations at several conferences, and a linked open dataset.

Prior to our international collaboration, no dataset was available to answer these simple questions and the required information was scattered in many different data sources. We set out to bring these data together and in doing so developed and refined our workflow. Our data paper documents this innovative workflow bringing together the various data elements needed to answer our research questions. Ultimately we created a Linked Open Data (LOD) dataset that amplified the names of women and female mythological beings celebrated through generic names of flowering plants.

Linking the Data

During our research process we focused on pulling data from a wide variety of sources while at the same time proactively sharing the data generated as widely as possible. This was done by adding and linking it to multiple public databases and sources (push-pull) including the International Plant Name Index (hereafter IPNI), Tropicos®, Wikidata, Bionomia and the Biodiversity Heritage Library (hereafter BHL).

Visualisation of our workflow to create a working list of flowering plant genera named for women. 

For our list of genera, each of the protologues were reviewed to confirm the etymology or eponymy. To find the generic prologues, we searched botanical databases such as IPNI and Tropicos, openly accessible providers of digital publications and other digital libraries and websites that provide free access to such publications. Here the BHL was invaluable as the majority of protologues and many other relevant publications were openly accessible through this digital library. Where no digital publication was available we accessed scientific literature through our affiliated institutions.

For the women, our starting point was the “Index of Eponymic Plant Names – Extended Edition” by Lotte Burkhardt (2018). We manually extracted all genera honouring women.  This dataset was supplemented with other sources including IPNI (2023), Mari Mut (2017-2021), a 2022 updated version of Burkhardt’s document (Burkhardt 2022), as well as suggestions received from colleagues and generated from our own research.

We collected the following information as structured data: information on the woman honoured, the genera named in honour of the woman, the year and place of the protologue or original publication (the nomenclatural reference), the author(s) of the genus name, and the link to the protologue or original publication if available online.

Wikidata

Wikidata was the central data repository and linking mechanism for this project as it provided structured data that can be read and edited by humans and machines and it acts as a hub for other identifiers. As such Wikidata played a central role in semantic linking and enriching of our data.

Wikidata items for the plant genera were created or enriched with information about the name, the author(s) of the genus and the year of publication. Those statements were referenced using the original publication. If the protologue was available on BHL, the BHL bibliographic or page number was added to that reference, thus creating a digital link improving access to the protologue. While undertaking this work we also collated a list of all those public domain publications that appeared to be absent from BHL. We passed on this list to BHL and requested these texts be scanned and added to BHL for the benefit of everyone.

We then added a named after statement to the Wikidata item for the appropriate plant genera linking that item to the Wikidata item for the woman honoured. Wikidata items for the women honoured were newly created or enriched. We researched each person and her contributions, plus information on mythological figures where necessary, and added this information to Wikidata items. Our work also included disambiguating the woman from other people with identical or similar names. 

To amplify the women’s contributions to science and to enrich the wider (biodiversity) data ecosystem, we linked to other Wikidata items and websites or databases by adding other relevant identifiers. For example if the women were botanists, botanical collectors or other naturalists, we used the author property to link the women to publications written by them. In addition, we added the women to Bionomia and attributed specimens collected or identified by them to their profiles.

Our work also included enriching Wikidata items of taxon authors. IPNI and Tropicos were searched for these author names, and websites such as BHL, the Global Biodiversity Information Facility (GBIF) or other specialist databases were consulted. Corrections or newly researched information on taxon authors was placed not just in Wikidata but was also sent together with the corresponding references to IPNI and Tropicos. This information was then used by those organizations to update these databases accordingly. 

As a result of our data being placed in Wikidata it is available to be queried via the Wikidata Query Service.  

Our Goal Achieved

As a result of our project, we published a dataset of 728 genera honouring women or female beings. This was a nearly twenty-fold increase in the number of genera linked to women in Wikidata. Our analysis paper on this data is forthcoming.

Notable Women 

Monsonia L.

All of us came away from this research with a favourite story. One that stood out was Ann Monson, for whom Linnaeus named Monsonia. Linnaeus wrote a delightful letter to her about their creating, platonically of course, a kind of plant love-child between them, in the form of this new genus.

Translated from Latin : “….Lock these [seeds] in a pot, and place them in the window of the chamber towards the sun, when it bursts forth in February, and in the first summer the sun blooms and lasts the most beautiful Alstromeria, which no one has seen in England, and you bring forth no flowers. If it should come to pass, as I wish, if you offer our flames, I would only wish to beget with you an only child, as a pledge of my love, little Monsonia, by which you may perpetuate the fame of Lady in the kingdom of Flora, who was the Queen of Women.”

Fittonia Coem.

Two eponymous women with an interesting story are Sarah Mary Fitton and her sister Elizabeth. They wrote Conversations on Botany in 1817 accompanied by colour engravings of flowers which popularised botany with women. The genus Fittonia was named in their honour.

Chanekia Lundell

Another woman honoured in a plant genus was Mercedes Chanek, a Mayan plant collector who worked in the 1930’s for Cyrus Longworth Lundell and collected for the University of Michigan in British Honduras, today Belize. Very little is known about her life and work. However, her collections are detailed in Tropicos and Bionomia, and you can see the genus named for her by Lundell in IPNI under Chanekia.

Medusa Lour. and other genera

Medusa (c. 1597), by Caravaggio

An example of a mythological female being honoured in several plant names is that of Medusa, who has the most genera named after her, six, more than any real woman!

We hope that our data paper inspires others to use the methodology and workflow described to create other linked open datasets, e.g. celebrating and amplifying the contributions of underrepresented or marginalised groups in science.

Data paper: 

von Mering S, Gardiner LM, Knapp S, Lindon H, Leachman S, Ulloa Ulloa C, Vincent S, Vorontsova MS (2023) Creating a multi-linked dynamic dataset: a case study of plant genera named for women. Biodiversity Data Journal 11: e114408. https://doi.org/10.3897/BDJ.11.e114408

Maximising the impact of standardised biodiversity data: Pensoft’s role in the EU project B-Cubed

In line with its commitment to providing open-access biodiversity data, Pensoft has joined forces with 12 organisations to form the B-Cubed project.

The problem at hand

Measuring the extent and dynamics of the global biodiversity crisis is a challenging task that demands rapid, reliable and repeatable biodiversity monitoring data. Such data is essential for policymakers to be able to assess policy options effectively and accurately. To achieve this, however, there is a need to enhance the integration of biodiversity data from various sources, including citizen scientists, museums, herbaria, and researchers.

B-Cubed’s response

B-Cubed (Biodiversity Building Blocks for policy) hopes to tackle this challenge by reimagining the process of biodiversity monitoring, making it more adaptable and responsive. 

B-Cubed’s approach rests on six pillars: 

  • Improved alignment between policy and biodiversity data. Working closely with existing biodiversity initiatives to identify and meet policy needs.
  • Evidence base. Leveraging data cubes to standardise access to biodiversity data using the Essential Biodiversity Variables framework. These cubes are the basis for models and indicators of biodiversity.
  • Cloud computing environment. Providing users with access to the models in real-time and on demand.
  • Automated workflows. Developing exemplary automated workflows for modelling using biodiversity data cubes and for calculating change indicators.
  • Case studies. Demonstrating the effectiveness of B-Cubed’s tools.
  • Capacity building. Ensuring that the solutions meet openness standards and training end-users to employ them.

Pensoft’s role

Harnessing its experience in the communication, dissemination and exploitation of numerous EU projects, Pensoft focuses on maximising B-Cubed’s impact and ensuring the adoption and long-term legacy of its results. This encompasses a wide array of activities, ranging all the way from building the project’s visual and online presence to translating its results into policy recommendations. Pensoft also oversees B-Cubed’s data management by developing a Data Management Plan which ensures the implementation of the FAIR data principles and maximises the access to and re-use of the project’s research outputs.

Full list of partners

Visit B-Cubed’s website at https://b-cubed.eu/. You can also follow the project on X @BCubedProject and LinkedIn /B-Cubed Project, as well as by subscribing to its newsletter here.

Data checking for biodiversity collections and other biodiversity data compilers from Pensoft

Guest blog post by Dr Robert Mesibov

Proofreading the text of scientific papers isn’t hard, although it can be tedious. Are all the words spelled correctly? Is all the punctuation correct and in the right place? Is the writing clear and concise, with correct grammar? Are all the cited references listed in the References section, and vice-versa? Are the figure and table citations correct?

Proofreading of text is usually done first by the reviewers, and then finished by the editors and copy editors employed by scientific publishers. A similar kind of proofreading is also done with the small tables of data found in scientific papers, mainly by reviewers familiar with the management and analysis of the data concerned.

But what about proofreading the big volumes of data that are common in biodiversity informatics? Tables with tens or hundreds of thousands of rows and dozens of columns? Who does the proofreading?

Sadly, the answer is usually “No one”. Proofreading large amounts of data isn’t easy and requires special skills and digital tools. The people who compile biodiversity data often lack the skills, the software or the time to properly check what they’ve compiled.

The result is that a great deal of the data made available through biodiversity projects like GBIF is — to be charitable — “messy”. Biodiversity data often needs a lot of patient cleaning by end-users before it’s ready for analysis. To assist end-users, GBIF and other aggregators attach “flags” to each record in the database where an automated check has found a problem. These checks find the most obvious problems amongst the many possible data compilation errors. End-users often have much more work to do after the flags have been dealt with.

In 2017, Pensoft employed a data specialist to proofread the online datasets that are referenced in manuscripts submitted to Pensoft’s journals as data papers. The results of the data-checking are sent to the data paper’s authors, who then edit the datasets. This process has substantially improved many datasets (including those already made available through GBIF) and made them more suitable for digital re-use. At blog publication time, more than 200 datasets have been checked in this way.

Note that a Pensoft data audit does not check the accuracy of the data, for example, whether the authority for a species name is correct, or whether the latitude/longitude for a collecting locality agrees with the verbal description of that locality. For a more or less complete list of what does get checked, see the Data checklist at the bottom of this blog post. These checks are aimed at ensuring that datasets are correctly organised, consistently formatted and easy to move from one digital application to another. The next reader of a digital dataset is likely to be a computer program, not a human. It is essential that the data are structured and formatted, so that they are easily processed by that program and by other programs in the pipeline between the data compiler and the next human user of the data.

Pensoft’s data-checking workflow was previously offered only to authors of data paper manuscripts. It is now available to data compilers generally, with three levels of service:

  • Basic: the compiler gets a detailed report on what needs fixing
  • Standard: minor problems are fixed in the dataset and reported
  • Premium: all detected problems are fixed in collaboration with the data compiler and a report is provided

Because datasets vary so much in size and content, it is not possible to set a price in advance for basic, standard and premium data-checking. To get a quote for a dataset, send an email with a small sample of the data topublishing@pensoft.net.


Data checklist

Minor problems:

  • dataset not UTF-8 encoded
  • blank or broken records
  • characters other than letters, numbers, punctuation and plain whitespace
  • more than one version (the simplest or most correct one) for each character
  • unnecessary whitespace
  • Windows carriage returns (retained if required)
  • encoding errors (e.g. “Dum?ril” instead of “Duméril”)
  • missing data with a variety of representations (blank, “-“, “NA”, “?” etc)

Major problems:

  • unintended shifts of data items between fields
  • incorrect or inconsistent formatting of data items (e.g. dates)
  • different representations of the same data item (pseudo-duplication)
  • for Darwin Core datasets, incorrect use of Darwin Core fields
  • data items that are invalid or inappropriate for a field
  • data items that should be split between fields
  • data items referring to unexplained entities (e.g. “habitat is type A”)
  • truncated data items
  • disagreements between fields within a record
  • missing, but expected, data items
  • incorrectly associated data items (e.g. two country codes for the same country)
  • duplicate records, or partial duplicate records where not needed

For details of the methods used, see the author’s online resources:

***

Find more for Pensoft’s data audit workflow provided for data papers submitted to Pensoft journals on Pensoft’s blog.

Integration of Freshwater Biodiversity Information for Decision-Making in Rwanda

Teams from Ghana, Malawi, Namibia and Rwanda during the inception meeting of the African Biodiversity Challenge Project in Kigali, Rwanda. Photo by Yvette Umurungi.

The establishment and implementation of a long-term strategy for freshwater biodiversity data mobilisation, sharing, processing and reporting in Rwanda is to support environment monitoring and the implementation of Rwanda’s National Biodiversity Strategy (NBSAP). In addition, it is to also help us understand how economic transformation and environmental change is affecting freshwater biodiversity and its resulting ecosystem services.

As part of this strategy, the Center of Excellence in Biodiversity and Natural Resource Management (CoEB) at the University of Rwanda, jointly with the Rwanda Environment Management Authority (REMA) and the Albertine Rift Conservation Society (ARCOS), are implementing the African Biodiversity Challenge (ABC) project “Integration of Freshwater Biodiversity Information for Decision-Making in Rwanda.”

The conference abstract for this project has been published in the open access journal Biodiversity Information Science and Standards (BISS). 

The CoEB has a national mandate to lead on biodiversity data mobilisation and implementation of the NBSAP in collaboration with REMA. This includes digitising data from reports, conducting analyses and reporting for policy and research, as indicated in Rwanda’s NBSAP.

The collation of the data will follow the international standards and will be available online, so that they can be accessed and reused from around the world. In fact, CoEB aspires to become a Global Biodiversity Informatics Facility (GBIF) node, thereby strengthening its capacity for biodiversity data mobilisation.

Data use training for the African Biodiversity Challenges at the South African National Biodiversity Institute (SANBI), South Africa. Photo by Yvette Umurungi.

The mobilised data will be organised using GBIF standards, and the project will leverage the tools developed by GBIF to facilitate data publication. Additionally, it will also provide an opportunity for ARCOS to strengthen its collaboration with CoEB as part of its endeavor to establish a regional network for biodiversity data management in the Albertine Rift Region.

The project is expected to conclude with at least six datasets, which will be published through the ARCOS Biodiversity Information System. These are to include three datasets for the Kagera River Basin; one on freshwater macro-invertebrates from the Congo and Nile Basins; one for the Rwanda Development Board archive of research reports from protected areas; and one from thesis reports from master’s and bachelor’s students at the University of Rwanda.

The project will also produce and release the first “Rwandan State of Freshwater Biodiversity”, a document which will describe the status of biodiversity in freshwater ecosystems in Rwanda and present socio-economic conditions affecting human interactions with this biodiversity.

The page of Center of Excellence in Biodiversity and Natural Resource Management (CoEB) at University of Rwanda on the Global Biodiversity Information Facility portal. Image by Yvette Umurungi.

***

The ABC project is a competition coordinated by the South African National Biodiversity Institute (SANBI) and funded by the JRS Biodiversity Foundation. The competition is part of the JRS-funded project, “Mobilising Policy and Decision-making Relevant Biodiversity Data,” and supports the Biodiversity Information Management activities of the GBIF Africa network.

 

Original source:

Umurungi Y, Kanyamibwa S, Gashakamba F, Kaplin B (2018) African Biodiversity Challenge: Integrating Freshwater Biodiversity Information to Guide Informed Decision-Making in Rwanda. Biodiversity Information Science and Standards 2: e26367. https://doi.org/10.3897/biss.2.26367

Audit finds biodiversity data aggregators ‘lose and confuse’ data

In an effort to improve the quality of biodiversity records, the Atlas of Living Australia (ALA) and the Global Biodiversity Information Facility (GBIF) use automated data processing to check individual data items. The records are provided to the ALA and GBIF by museums, herbaria and other biodiversity data sources.

However, an independent analysis of such records reports that ALA and GBIF data processing also leads to data loss and unjustified changes in scientific names.

The study was carried out by Dr Robert Mesibov, an Australian millipede specialist who also works as a data auditor. Dr Mesibov checked around 800,000 records retrieved from the Australian MuseumMuseums Victoria and the New Zealand Arthropod Collection. His results are published in the open access journal ZooKeys, and also archived in a public data repository.

“I was mainly interested in changes made by the aggregators to the genus and species names in the records,” said Dr Mesibov.

“I found that names in up to 1 in 5 records were changed, often because the aggregator couldn’t find the name in the look-up table it used.”

data_auditAnother worrying result concerned type specimens – the reference specimens upon which scientific names are based. On a number of occasions, the aggregators were found to have replaced the name of a type specimen with a name tied to an entirely different type specimen.

The biggest surprise, according to Dr Mesibov, was the major disagreement on names between aggregators.

“There was very little agreement,” he explained. “One aggregator would change a name and the other wouldn’t, or would change it in a different way.”

Furthermore, dates, names and locality information were sometimes lost from records, mainly due to programming errors in the software used by aggregators to check data items. In some data fields the loss reached 100%, with no original data items surviving the processing.

“The lesson from this audit is that biodiversity data aggregation isn’t harmless,” said Dr Mesibov. “It can lose and confuse perfectly good data.”

“Users of aggregated data should always download both original and processed data items, and should check for data loss or modification, and for replacement of names,” he concluded.

###

Original source:

Mesibov R (2018) An audit of some filtering effects in aggregated occurrence records. ZooKeys 751: 129-146. https://doi.org/10.3897/zookeys.751.24791

Five new Pensoft journals integrated with Dryad to improve data discoverability

Academic publisher Pensoft strengthens partnership with Dryad by adding its latest five journals to the list integrated with the digital repository. From now on, all authors who choose any of the journals published under Pensoft’s imprint will be able to opt for uploading their datasets on Dryad. At the click of a button, the authors will have their data additionally discoverable, reusable, and citable.

Started in 2011 as one of the first ever integrated data deposition workflows between a repository (Dryad) and a publisher (Pensoft), the partnership has now been reinforced to cover publications submitted to any of Pensoft’s 21 journals, including recently launched Research Ideas and Outcomes (RIO) and One Ecosystem, as well as BioDiscovery, African Invertebrates and Zoologia, which all moved to Pensoft within the last year.

By agreeing to deposit their datasets to Dryad, authors take advantage of a specialised and highly acknowledged platform to easily showcase and, hence, take credit for their data. On the other hand, the science community, including educators and students, can readily access the data, facilitating verification, citability and even potential collaborations.

“Dedicated to open and reproducible science, at Pensoft we have always strived to encourage our authors to make their research as transparent and, hence, trustworthy as possible, by providing the right infrastructure and support,” says Pensoft’s founder and CEO Prof. Lyubomir Penev. “By strengthening our long-year partnership with Dryad, I envision more and more authors, who publish in our journals, adding open data to their list of best practices.”

“Dryad works to promote data that are openly available, integrated with the scholarly literature, and routinely re-used to create knowledge,” said Dryad’s Executive Director, Meredith Morovati. “We are encouraged by the growth of our partnership with Pensoft, one of our earliest supporters. We are honored to provide services to Pensoft authors to ensure their data is openly available, linked to the article, and preserved for future use and for the future of science.”

LifeWatchGreece launches a Special Paper Collection for Greek biodiversity research

Developed in the 1990s and early 2000s, LifeWatch is one of the large-scale European Research Infrastructures (ESFRI) created to support biodiversity science and its developments. Its ultimate goal is to model Earth’s biodiversity based on large-scale data, to build a vast network of partners, and to liaise with other high-quality and viable research infrastructures (RI).

Being one of the founding LifeWatch member states, Greece has not only implemented LifeWatchGreece, but it is all set and ready to “fulfill the vision of the Greek LifeWatch RI and establish it as the biodiversity Centre of Excellence for South-eastern Europe”, according to the authors of the latest Biodiversity Data Journal‘s Editorial: Dr Christos Arvanitidis, Dr Eva Chatzinikolaou, Dr Vasilis Gerovasileiou, Emmanouela Panteri, Dr Nicolas Bailly, all affiliated with the Hellenic Centre for Marine Research (HCMR) and part of the LifeWatchGreece Core Team, together with Nikos Minadakis, Foundation for Research and Technology Hellas (FORTH), Alex Hardisty, Cardiff University, and Dr Wouter Los, University of Amsterdam.

lwg-presentationMaking use of the technologically advanced open access Biodiversity Data Journal and its Collections feature, the LifeWatchGreece team is publishing a vast collection of peer-reviewed scientific outputs, including software descriptions, data papers, taxonomic checklists and research articles, along with the accompanying datasets and supporting material. Their intention is to demonstrate the availability and applicability of the developed e-Services and Virtual Laboratories (vLabs) to both the scientific community, as well as the broader domain of biodiversity management.

The LifeWatchGreece Special Collection is now available in Biodiversity Data Journal, with a series of articles highlighting key contributions to the large-scale European LifeWatch RI. The Software Description papers explain the LifeWatchGreece Portal, where all the e-Services and the vLabs provided by LifeWatchGreece RI are hosted; the Data Services based on semantic web technologies, which provide detailed and specialized search paths to facilitate data mining; the R vLab which can be used for a series of statistical analyses in ecology, based on an integrated and optimized online R environment; and the Micro-CT vLab, which allows the online exploration, dissemination and interactive manipulation of micro-tomography datasets.

The LifeWatchGreece Special Collection also includes a series of taxonomic checklists (preliminary, updated and/or annotated); a series of data papers presenting historical and original datasets; and a selection of research articles reporting on the outcomes, methodologies and citizen science initiatives developed by collaborating research projects, which have shared human, hardware and software resources with LifeWatchGreece RI.

LifeWatchGreece relies on a multidisciplinary approach, involving several subsidiary initiatives; collaborations with Greek, European and World scientific communities; specialised staff, responsible for continuous updates and developments; and, of course, innovative online tools and already established IT infrastructure.

###

Original source:

Arvanitidis C, Chatzinikolaou E, Gerovasileiou V, Panteri E, Bailly N, Minadakis N, Hardisty A, Los W (2016) LifeWatchGreece: Construction and operation of the National Research Infrastructure (ESFRI). Biodiversity Data Journal 4: e10791. https://doi.org/10.3897/BDJ.4.e10791

Additional information:

This work has been supported by the LifeWatchGreece infrastructure (MIS 384676), funded by the Greek Government under the General Secretariat of Research and Technology (GSRT), ESFRI Projects, National Strategic Reference Framework (NSRF).