Interoperable biodiversity data extracted from literature through open-ended queries

OpenBiodiv is a biodiversity database containing knowledge extracted from scientific literature, built as an Open Biodiversity Knowledge Management System. 

The OpenBiodiv contribution to BiCIKL

Apart from coordinating the Horizon 2020-funded project BiCIKL, scholarly publisher and technology provider Pensoft has been the engine behind what is likely to be the first production-stage semantic system to run on top of a reasonably-sized biodiversity knowledge graph.

OpenBiodiv is a biodiversity database containing knowledge extracted from scientific literature, built as an Open Biodiversity Knowledge Management System. 

As of February 2023, OpenBiodiv contains 36,308 processed articles; 69,596 taxon treatments; 1,131 institutions; 460,475 taxon names; 87,876 sequences; 247,023 bibliographic references; 341,594 author names; and 2,770,357 article sections and subsections.

In fact, OpenBiodiv is a whole ecosystem comprising tools and services that enable biodiversity data to be extracted from the text of biodiversity articles published in data-minable XML format, as in the journals published by Pensoft (e.g. ZooKeys, PhytoKeys, MycoKeys, Biodiversity Data Journal), and other taxonomic treatments – available from Plazi and Plazi’s specialised extraction workflow – into Linked Open Data.

“I believe that OpenBiodiv is a good real-life example of how the outputs and efforts of a research project may and should outlive the duration of the project itself. Something that is – of course – central to our mission at BiCIKL.”

explains Prof Lyubomir Penev, BiCIKL’s Project Coordinator and founder and CEO of Pensoft.

“The basics of what was to become the OpenBiodiv database began to come together back in 2015 within the EU-funded BIG4 PhD project of Victor Senderov, later succeeded by another PhD project by Mariya Dimitrova within IGNITE. It was during those two projects that the backend Ontology-O, the first versions of RDF converters and the basic website functionalities were created,”

he adds.

At the time OpenBiodiv became one of the nine research infrastructures within BiCIKL tasked with the provision of virtual access to open FAIR data, tools and services, it had already evolved into a RDF-based biodiversity knowledge graph, equipped with a fully automated extraction and indexing workflow and user apps.

Currently, Pensoft is working at full speed on new user apps in OpenBiodiv, as the team is continuously bringing into play invaluable feedback and recommendation from end-users and partners at BiCIKL. 

As a result, OpenBiodiv is already capable of answering open-ended queries based on the available data. To do this, OpenBiodiv discovers ‘hidden’ links between data classes, i.e. taxon names, taxon treatments, specimens, sequences, persons/authors and collections/institutions. 

Thus, the system generates new knowledge about taxa, scientific articles and their subsections, the examined materials and their metadata, localities and sequences, amongst others. Additionally, it is able to return information with a relevant visual representation about any one or a combination of those major data classes within a certain scope and semantic context.

Users can explore the database by either typing in any term (even if misspelt!) in the search engine available from the OpenBiodiv homepage; or integrating an Application Programming Interface (API); as well as by using SPARQL queries.

On the OpenBiodiv website, there is also a list of predefined SPARQL queries, which is continuously being expanded.

Sample of predefined SPARQL queries at OpenBiodiv.

“OpenBiodiv is an ambitious project of ours, and it’s surely one close to Pensoft’s heart, given our decades-long dedication to biodiversity science and knowledge sharing. Our previous fruitful partnerships with Plazi, BIG4 and IGNITE, as well as the current exciting and inspirational network of BiCIKL are wonderful examples of how far we can go with the right collaborators,”

concludes Prof Lyubomir Penev.

***

Follow BiCIKL on Twitter and Facebook. Join the conversation on Twitter at #BiCIKL_H2020.

You can also follow Pensoft on Twitter, Facebook and Linkedin and use #OpenBiodiv on Twitter.

Hidden in plain sight: snake named 46 years after first discovery

Although it had been documented and studied for years, it took molecular analyses to confirm that the snake was in fact a species new to science.

A new species of snake was described from western Panama. First documented in 1977 by Dr. Charles Myers, a scientist studying amphibians and reptiles throughout Panama, it was only now that it got a scientific description.

The new snake has been given the name Dipsas aparatiritos. The genus Dipsas includes the snailsuckers, a unique group of snakes that feed on soft-bodied prey including snails extracted from their shells, slugs, and earthworms. The species epithet “aparatiritos” is Greek for unnoticed: a reference to the fact that the snake had remained hidden in plain sight for over forty years at a very well-studied field site.

A snail-eating snake.
Live individual of Dipsas aparatiritos in Parque Nacional General de División Omar Torrijos Herrera photographed in the wild. Photo by Kevin Enge

Scientists Dr. Julie Ray, University of Nevada – Reno, Paola Sánchez-Martínez, Abel Batista, Daniel G. Mulcahy, Coleman M. Sheehy III, Eric N. Smith, R. Alexander Pyron and Alejandro Arteaga, have described the new species in a paper published in the open-access journal ZooKeys.

Dipsas aparatiritos has the characteristic bulbous head and brown-and-black patterning of many of the snakes in the genus. It looks very similar to its closest known relative, Dipsas temporalis, which is also found in Panama. It is now known that D. aparatiritos is endemic to, or known only from, the western and central parts of the country.

The Hidden Snail-eating Snake, Dipsas aparatiritos. Photo by Dr. Julie M. Ray

Panama has a rich diversity of snakes, with over 150 documented species in a country the size of Ireland or the U.S. state of South Carolina. Dr. Ray has documented over 55 species of snakes in Parque Nacional General de División Omar Torrijos Herrera where the newly described snake is best studied, and over 80 species in Coclé Province in Central Panama. She published a field guide, Snakes of Panama, in 2017.

Four individuals of Dipsas aparatiritos intertwined on one plant at Parque Nacional General de División Omar Torrijos Herrera. Photo by Noah Carl

Co-author of the species description Dr. Alex Pyron, The George Washington University, visited Parque Nacional General de División Omar Torrijos Herrera in June 2013 with Dr. Frank Burbrink, American Museum of Natural History. “That was my first trip to Central America,” he says. “We were able to see the after-effects of the amphibian declines. But I was struck by the diversity and abundance of snakes that were still present, including this species of snail-eater we have just described, the rare Geophis bellus [a small leaf litter snake known from just one specimen prior to this discovery] and an unusual Coralsnake.”

Despite being a new species, Dipsas aparatiritos is relatively common in Parque Nacional General de División Omar Torrijos Herrera and has been studied for years before it was described. Dr. Ray has published a paper about the diet of snail-eating snakes, where it was found that earthworms from bromeliads compose a large portion of the diet of Dipsas aparatiritos. She also co-authored a paper on trophic cascades following amphibian declines, where it was found that Dipsas aparatiritos actually was increasing in numbers due to a diet independent of amphibians.

The Hidden Snail-eating Snake, Dipsas aparatiritos. Photo by Dr. Julie M. Ray

Dipsas aparatiritos is already considered Near Threatened based on IUCN Red List standards. The snake is endemic to Panama and comes from a limited range in the cloud forests of mid-elevation, where at least 44% of the overall range has been deforested. In addition, as snakes are constantly persecuted by humans, almost all snake species are in danger of extinction in the near future. Efforts must be made to conserve these rare species, the researchers believe, especially as so many are just being described now.

 “This work was a true collaboration of scientists from different countries each contributing their expertise to thoroughly understand this new species, morphologically and molecularly,” said Dr. Ray.

“We are in an exciting time in science. Naturalists and scientists must continue to document the natural world; there are many species out there yet to be found and described. The usage of molecular techniques is exciting and facilitates the confirmation of so many new species.”

Research article:

Ray JM, Sánchez-Martínez P, Batista A, Mulcahy DG, Sheehy III CM, Smith EN, Pyron RA, Arteaga A (2023) A new species of Dipsas (Serpentes, Dipsadidae) from central Panama. ZooKeys 1145: 131-167. https://doi.org/10.3897/zookeys.1145.96616

Homo sapiens or insapiens? A new insect species from Kosovo cries for help

A new insect species from Kosovo challenges the idea of the intelligence and cleverness of humankind, compared to other organisms, with its scientific name.

Type locality of the new species. Photo by Halil Ibrahimi

Lying at the center of the Balkan Peninsula, Kosovo harbors a diversity of ecosystems and conditions, which have favored processes leading to the existence of many endemic and rare species. In the past few years, several new species of aquatic insects have been discovered from the small Balkan country, making it unique in terms of biodiversity. Unfortunately, as elsewhere in the Balkans, many of these ecosystems have deteriorated heavily.

A team of scientists from Kosovo, led by Professor Halil Ibrahimi of the University of Prishtina, recently found a new species of aquatic insect, a caddisfly, from the Sharr Mountains in Kosovo, and named it Potamophylax humoinsapiens.

The species epithet humoinsapiens is a combination of two Latin words, “humo”, which in English means “to cover with soil, to bury,” and “insapiens,” meaning “unwise”. The researchers explain this name refers to the unwise and careless treatment of the habitats of the new species: hydropower plant, illegal logging and pollution have greatly degraded the area in the past years. “In some segments, whole parts of the Lepenc River are “buried” in large pipes,” they write in their study, which was published in the open-access Biodiversity Data Journal.

Potamophylax humoinsapiens. Photo by Halil Ibrahimi

“The species name ‘humoinsapiens’ ironically sounds like Homo insapiens, and this new species is right in calling us unwise,” thinks Prof. Ibrahimi. “With its actions, humankind has caused the extinction of many species of insects and other organisms during the past decades and has degraded greatly all known ecosystems in the planet. The debate on questioning wise nature of humans is already ongoing.

In the past few years, Professor Halil Ibrahimi and his team have found several new species of aquatic insects from the Balkans, Middle East and North Africa. In an attempt to raise awareness for this group of vulnerable creatures, endangered greatly by human activities, the team of scientists has given their species unique names. One of their previous discoveries was named Potamophylax coronavirus in order to raise the attention to the silent and dangerous “pandemic” humans have caused in freshwater ecosystems in the Balkans.

The research team behind the discovery. Photo by Halil Ibrahimi

“By combining classical taxonomy and modern molecular analysis techniques with the unique names, we are making insect species talk to our collective consciousness. It is in humankind’s capacity to earn the name Homo sapiens again,” the researchers conclude.

The study was financed by the Ministry of Education, Science, Technology and Information of the Republic of Kosovo and was conducted in the Laboratory of Zoology-Department of Biology of the University of Prishtina.

Original source:

Ibrahimi H, Bilalli A, Gashi A, Grapci Kotori L, Slavevska Stamenkovič V, Geci D (2023) Potamophylax humoinsapiens sp. n. (Trichoptera, Limnephilidae), a new species from the Sharr Mountains, Republic of Kosovo. Biodiversity Data Journal 11: e97969. https://doi.org/10.3897/BDJ.11.e97969

Follow Biodiversity Data Journal on Facebook and Twitter.

Experts in insect taxonomy “threatened by extinction” reveals the first European Red List of Taxonomists

While insect populations continue to decline, taxonomic expertise in Europe is at serious risk, confirms data obtained within the European Red List of Insect Taxonomists, a recent study commissioned by the European Union. 

Expertise tends to be particularly poor in the countries with the richest biodiversity, while taxonomists are predominantly male and ageing

While insect populations continue to decline, taxonomic expertise in Europe is at serious risk, confirms data obtained within the European Red List of Insect Taxonomists, a recent study commissioned by the European Union. 

Scientists who specialise in the identification and discovery of insect species – also known as insect taxonomists – are declining across Europe, highlights the newly released report by CETAF, International Union for Conservation of Nature (IUCN) and Pensoft. The authors of this report represent different perspectives within biodiversity science, including natural history and research institutions, nature conservation, academia and scientific publishing.

Despite the global significance of its taxonomic collections, Europe has been losing taxonomic expertise at such a rate that, at the moment nearly half (41.4%) of the insect orders are not covered by a sufficient number of scientists. If only EU countries are counted, the number looks only slightly more positive (34.5%). Even the four largest insect orders: beetles (Coleoptera), moths and butterflies (Lepidoptera), flies (Diptera) and wasps, bees, ants and sawflies (Hymenoptera) are only adequately ‘covered’ in a fraction of the countries.

To obtain details about the number, location and productivity of insect taxonomists, the team extracted information from thousands of peer-reviewed research articles published in the last decade, queried the most important scientific databases and reached out to over fifty natural science institutions and their networks. Furthermore, a dedicated campaign reached out to individual researchers through multiple communication channels. As a result, more than 1,500 taxonomists responded by filling in a self-declaration survey to provide information about their personal and academic profile, qualification and activities. 

Then, the collected information was assessed against numerical criteria to classify the scientists into categories similar to those used by the IUCN Red List of Threatened SpeciesTM. In the European List of Insect Taxonomists, these range from Eroded Capacity (equivalent to Extinct) to Adequate Capacity (equivalent to Least Concern). The assessment was applied to the 29 insect orders (i.e. beetles, moths and butterflies etc.) to figure out which insect groups the society, conservation practitioners and decision-makers need not be concerned at this point.

Overview of the taxonomic capacity in European countries based upon the Red List Index (colour gradient goes from red (Eroded Capacity) to green (Adequate Capacity).
Image by the European Red List of Taxonomists consortium.

On a country level, the results showed that Czechia, Germany and Russia demonstrate the most adequate coverage of insect groups. Meanwhile, Albania, Azerbaijan, Belarus, Luxembourg, Latvia, Ireland and Malta turned out to be the ones with insufficient number of taxonomists.

In most cases, the availability of experts seems to correlate to GDP, as wealthiest countries tend to invest more in their scientific institutions.

What is particularly worrying is that the lack of taxonomic expertise is more evident in the countries with the greatest species diversity. This trend may cause even more significant problems in the knowledge and conservation of these species, further aggravating the situation. Thus, the report provides further evidence about a global pattern where the countries richest in biodiversity are also the ones poorest in financial and human resources. 

The research team also reminds that it is European natural history museums that host the largest scientific collections – including insects – brought from all over the globe. As such, Europe is responsible to the world for maintaining taxonomic knowledge and building adequate expert capacity.

Other concerning trends revealed in the new report are that the community of taxonomists is also ageing and – especially in the older groups – male-dominated (82%). 

One reason to have fewer young taxonomists could be due to limited opportunities for professional training (…), and the fact that not all professional taxonomists provide it, as a significant number of taxonomists are employed by museums and their opportunities for interaction with university students is probably not optimal. Gender bias is very likely caused by multiple factors, including fewer opportunities for women to be exposed to taxonomic research and gain an interest, unequal offer of career opportunities and hiring decisions. A fair-playing field for all genders will be crucial to address these shortcomings and close the gap.

comments Ana CasinoCETAF’s Executive Director.

***

Entomologist examining a small insect under a microscope.
Photo by anton_shoshin/stockadobe.com.
The European Red List of Taxonomists concludes with practical recommendations concerning strategic, science and societal priorities, addressed to specific decision-makers.

The authors give practical examples and potential solutions in support of their call to action.

For instance, in order to develop targeted and sustainable funding mechanisms to support taxonomy, they propose the launch of regular targeted Horizon Europe calls to study important insect groups for which taxonomic capacity has been identified to be at a particularly high risk of erosion.

To address specific gaps in expertise – such as the ones reported in the publication from Romania – a country known for its rich insect diversity, yet poor in taxonomic expertise – the consortium proposes the establishment of a natural history museum or entomological research institute that is well-fitted to serve as a taxonomic facility.

Amongst the scientific recommendations, the authors propose measures to ensure better recognition of taxonomic work at a multidisciplinary level. The scientific community, including disciplines that use taxonomic research, such as molecular biology, medicine and agriculture – need to embrace universal standards and rigorous conduct for the correct citation of scientific publications by insect taxonomists.

Societal engagement is another important call. “It is pivotal to widely raise awareness of the value and impact of taxonomy and the work of taxonomists. We must motivate young generations to join the scientific community” points Prof. Lyubomir Penev, Managing Director of Pensoft.

***

Understanding taxonomy is a key to understanding the extinction risk of speciesIf we strategically target the gaps in expert capacity that this European Red List identifies, we can better protect biodiversity and support the well-being and livelihoods of our societies. With the climate crisis at hand, there is no time left to waste,

added David Allen from the IUCN Red List team.

As a dedicated supporter of the IUCN Red List, I am inspired by this call to strengthen the capacity, guided by evidence and proven scientific methods. However, Europe has much more scientific capacity than most biodiversity-rich regions of the world. So, what this report particularly highlights is the need for massively increasing investment in scientific discovery, and building taxonomic expertise, around the world,”  

said Jon Paul Rodríguez, Chair of the IUCN Species Survival Commission.

***

Follow and join the conversation on Twitter using the #RedListTaxonomists hashtag. 

🥳 Here goes THE title in our New Species Showdown!

From the kingdom of plants, welcome the all-time crowd-favourite species ever described in a Pensoft journal!

Which one is the species that springs to mind when you think about the most awesome discoveries in recent times?

In an age where we more than ever need to appreciate and preserve the magnificent biodiversity inhabiting the Earth, we decided to go for a lighter and fun take on the work of taxonomists that often goes unnoticed by the public. 

From the ocean depths surrounding Indonesia to the foliage of the native forests of Príncipe Island and into the soils of Borneo, we started with 16 species described as new to science in journals published by Pensoft over the years. 

Out of these most amazing creatures, over the past several weeks we sought to find who’s got the greatest fandom by holding a poll on Twitter (you can follow it further down here or via #NewSpeciesShowdown).

Grand Finale – here comes the champion!

Truly, we couldn’t have a more epic final!

The two competitors come from two kingdoms, two opposite sides of the globe, and the “pages” of two journals, namely PhytoKeys and Evolutionary Systematics.

While we need to admit that we ourselves expected to crown an animal as the crowd-favourite, we take the opportunity to congratulate the botanists amongst our fans for the well-deserved win of Nepenthes pudica (see the species description)!

Find more about the curious one-of-a-kind pitcher plant in this blog post, where we announced its discovery following the new species description in PhytoKeys in June 2022:

Back then, N. pudica gave a good sign about its worldwide web appeal, when it broke the all-time record for online popularity in a competition with all plant species described in PhytoKeys over the journal’s 22-year history of taxonomic papers comrpising over 200 issues.

What’s perhaps even more curious, is that there is only one species EVER described in a Pensoft-published journal that has so far triggered more tweets than the pitcher plant, and that species is the animal that has ended up in second place in the New Species Showdown: a tiny amphibian living in Peru, commonly known as the the Amazon Tapir Frog (Synapturanus danta). Which brings us once again to the influence of botanists in taxonomic research.

Read more about its discovery in the blog post from February 2022:

Another thing that struck us during the tournament was that there was only one species described in our flagship journal in systematic journal ZooKeys: the supergiant isopod Bathynomus raksasa, that managed to fight its way to the semi-finals, where it lost against S. danta.

This makes us especially proud with our diverse and competitive journal portfolio full of titles dedicated to biodiversity and taxonomic research!

The rules

Twice a week, @Pensoft would announce a match between two competing species on Twitter using the hashtag #NewSpeciesShowdown, where everyone could vote in the poll for their favourie.

Disclaimer

This competition is for entertainment purposes only. As it was tremendously tough to narrow the list down to only sixteen species, we admit that we left out a lot of spectacular creatures.

To ensure fairness and transparency, we made the selection based on the yearly Altmetric data, which covers articles in our journals published from 2010 onwards and ranks the publications according to their online mentions from across the Web, including news media, blogs and social networks. 

We did our best to diversify the list as much as possible in terms of taxonomic groups. However, due to the visual-centric nature of social media, we gave preference to immediately attractive species.

All battles:

(in chronological order)

Round 1
The first tie of the New Species Showdown was between the olinguito: Bassaricyon neblina (see species description) and the “snow-coated” tussock moth Ivela yini (see species description).
In the second battle, we faced two marine species discovered in the Indian Ocean and described in ZooKeys. The supergiant isopod B. raksasa (see species description) won against the Rose Fariy Wrasse C. finifenmaa (see species description) with strong 75%.
In the third battle, we faced two frog species: the tapir ‘chocolate’ frog described in Evolutionary Systematics (see species description) winning against the ‘glass frog’ described in Zookeys (see species description) with 73%.
With 62% of the votes, the two-species tournament saw the Harryplax severus crab grab the win against another species named after a great wizard from the Harry Potter universe: the Salazar’s pit viper, which was described in the journal Zoosystematics and Evolution in 2020. The “unusual” crustacean was described back in 2017 in ZooKeys. As its species characters matched no genus known to date, the species also established the Harryplax genus.
With the fifth battle in the New Species Showdown taking us to the Kingdom of Plants, we enjoyed a great battle between the first pitcher plant found to grow its pitchers underground to dine (see the full study) and the Demon’s orchid, described in 2016 from a single population spread across a dwarf montane forest in southern Colombia (read the study). Both species made the headlines across the news media around the world following their descriptions in our flagship botany journal PhytoKeys.
Next, we saw the primitive dipluran Haplocampa wagnelli (read its species description in Subterranean Biology) – a likely survivor of the Ice Age thanks to the caves of Canada – win the public in a duel against Xuedytes bellus (described in ZooKeys in 2017), also known as the Most cave-adapted trechine beetle in the world!
We had a close battle between the Principe Scops-owl Otus bikegila (see species description published in our ZooKeys earlier in 2022) and the blue-tailed Monitor lizard Varanus semotus (also first ‘known’ from the pages of ZooKeys, 2016). Being adorable species, but also ‘castaways’ on isolated islands in the Atlantic, they made great sensations upon their discovery. In fact, the reptile won with a single vote!
In the last battle of Round 1, the ‘horned’ tarantula C. attonitifer claimed the victory with a strong (80%) advantage from its competitor with a rebel name: the freshwater crayfish C. snowden (species description in ZooKeys from 2015). Described in African Invertebrates in 2019, the arachnid might be one amongst many ‘horned’ baboon spiders, yet there was something quite extraordinary about its odd protuberance. Furthermore, it came to demonstrate how little we know about the fauna of Angola:  a largely underexplored country located at the intersection of several ecoregions.
Round 2 – Quarter-finals
In the first quarter-final round, in the close battle, the isopod ’emerged’ from the ocean depths of Indonesia B. raksasa (species description in Zookeys from 2020) claimed the victory with just a few votes difference (58%!) from its competitor: lovely olinguito B. neblina, also described in Zookeys but back in 2013.
In the second round of the quarter-final, the tapir ‘chocolate’ frog S. danta (described in Evolutionary Systematics this year) claimed the victory with a significant advantage (69%) over its competitor crab H. severus described in Zookeys in 2017.
The third battle in Round 2 secured a place at the semi-finals for the only plant to get this far in the New Species Showdown. If you are dedicated to the mission of proving the plant kingdom superior: keep supporting Nepenthes pudica in the semi-finals and beyond!
In the meantime, read the full description of the species, published in our PhytoKeys in June.
The last quarter-final send the Angolan ‘horned’ tarantula to the next round. Described in African Invertebrates in 2019, its discovery would have likely remained a secret had it not been for the local tribes who provided the research team with crucial information about the curious arachnid.
Round 3 – Semi-finals
Curiously enough, by winning against the ‘supergiant’ isopod B. raksasa – also known around the Internet as the ‘Darth Vader of the seas’ – the Amazonian anuran S. danta outcompetes the last species in the New Species Showdown representing our flagship taxonomy journal: ZooKeys.

The charming anuran was described in February 2022 in Evolutionary Systematics, a journal dedicated to whole-organism biology that we publish on behalf of the Leibniz Institute for the Analysis of Biodiversity Change (LIB).
In a dramatic turn of events, the tight match between the Angolan tarantula C. attonitifer , whose ‘horn’ protruding from its back surprised the scientists because of its unique structure and soft texture, and the first pitcher plant whose ‘traps’ can be found underground in Borneo, ended up with the news that the New Species Showdown will be concluding with a battle between the kingdoms Animalia and Plantae! What a denouement!

The record-breaking plant was described in June 2022 in PhytoKeys: a journal launched by Pensoft in 2010 with the mission to introduce fast, linked and open publishing to plant taxonomy.
THE FINAL
And here we were at the finish line.
But why did we hold the tournament right now?

If you have gone to the Pensoft website at any point in 2022, visited our booth at a conference, or received a newsletter from any of our journals, by this time, you must be well aware that in 2022 – more precisely, on 25 December – we turned 30. And we weren’t afraid to show it!

Pensoft’s team happy to showcase the 30-year story of the company at various events this year.
Left: Maria Kolesnikova at the annual Biodiversity Information Standards (TDWG 2022) conference, hosted by Pensoft in Sofia, Bulgaria. Right: Iva Boyadzhieva at the XXVI International Congress of Entomology (ICE 2022) in Helsinki, Finland.

Indeed, 30 is not that big of a number, as many of us adult humans can confirm. Yet, we take pride in reminiscing about what we’ve done over the last three decades. 

The truth is, 30 years ago, we wouldn’t have been able to picture this day, let alone think that we’d be sharing it with all of you: our journal readers, authors, editors and reviewers, collaborators in innovation, project partners, and advisors. 

Long story short, we wanted to do something special and fun to wrap up our anniversary year. While we have been active in various areas, including development of publishing technology concerning open and FAIR access and linkage for research outcomes and underlying data; and multiple EU-supported scientific projects, we have always been associated with our biodiversity journal portfolio.

Besides, who doesn’t like to learn about the latest curious creature that has evaded scientific discovery throughout human history up until our days? 😉

Now, follow the #NewSpeciesShowdown to join the contest!

Digitising beans to feed the world

In 2018, NHM London’s digitisation team started a project to digitise non-type herbarium material from the legume family. A recent data paper in the Biodiversity Data Journal reports on the outcomes.

You can find the original blog post by the Natural History Museum of London, reposted here with minor edits.

Legumes are a group of plants that include soybeans, peas, chickpeas, peanuts and lentils. They are a significant source of protein, fibre, carbohydrates, and minerals in our diet and some, like the cowpea, are resistant to droughts.

In 2018, the Natural History Museum of London’s (NHM London) digitisation team started a project in collaboration with project leader Royal Botanic Gardens Kew and the Royal Botanic Garden Edinburgh.

The project’s outcomes were published in a data paper in the Biodiversity Data Journal. Within the project, the digitisation team aimed to collectively digitise non-type herbarium material from the legume family. This includes rosewood trees (Dalbergia), padauk trees (Pterocarpus) and the Phaseolinae subtribe that contains many of the beans cultivated for human and animal food.

This project was made possible through the Department for Environment Food & Rural Affairs (DEFRA)-allocated Official Development Assistance (ODA) funding, distributed by the UK government in its “global efforts to defeat poverty, tackle instability and create prosperity in developing countries”.

AfricanGuinea, Ethiopia, Sudan, Kenya, Uganda, Tanzania, Mozambique, Malawi and Madagascar
AsianBangladesh, Myanmar, Nepal, New Guinea and India
Southern and Central AmericanGuatemala, Honduras, El Salvador, Nicaragua, Bolivia, Argentina and Brazil
ODA-listed Countries

The legume groups: Dalbergia, Pterocarpus and Phaseolinae,were chosen for digitisation to support the development of dry beans as a sustainable and resilient crop, and to aid conservation and sustainable use of rosewood and padauk trees. Some of these beans, especially cow pea and pigeon pea, are sustainable and resilient crops, as they can be grown in poor-quality soils and are drought stress resistant. This makes them particularly suitable for agricultural production where the growing of other crops would be difficult.

Digitally discoverable herbarium specimens can provide important information about the distribution of individual species, as well as highlighting which species occur naturally together.

While there have been collaborative efforts between herbaria in the past, these have tended to prioritise digitisation of type specimens: the example specimens for which a species is named.

Types are important to identification, but being individual specimens, they don’t offer insights into species distribution over time. By focusing on the non-types across the world and over the last 200 years, we have released a brand-new resource to the global scientific community.

Searching for beans

This collection was digitised by creating an inventory record for each specimen, attaching images of each herbarium sheet, and then transcribing more data and georeferencing the specimens, providing an accurate locality in space and time for their collection. 

We originally had four months and three members of staff to digitise over 11,000 specimens. The Covid-19 lockdown was ironically rather lucky for this project as it enabled us to have more time to transcribe and georeference all of the records. 

say the researchers behind the digitisation project.
Map showing breakdown of records by country.

“We were able to assign country-level data to 10,857 out of the total number of 11,222 records. We were also able to transcribe the collectors’ names from the majority of our specimen labels (10,879 out of 11,222). Only 770 out of the 2,226 individuals identified during this project collected their specimens in ODA listed countries. The highest contributors were: Richard Beddome (130 specimens), Charles Clarke (110), Hans Schlieben (98) and Nathaniel Wallich (79). The breakdown of records by ODA country can be seen in the chart below. “

Map showing breakdown of records by country and pie chart showing distribution by ODA listed countries.

From our data, we can see the peak decade of collection was the 1930s, with almost half (4,583 specimens or 49,43%) collected between 1900 and 1950 (Fig. 10).

This peak can be attributed to three of our most prolific collectors: Arthur Kerr, John Gossweiler and Georges Le Testu, all of whom were most active in the 1930s. The oldest specimen (BM013713473) was collected by Mark Catesby (1683-1749) in the Bahamas in 1726.

they explain.

An interesting, but perhaps unsurprising, finding is that our collection is strongly male-dominated.

There are only two women (Caroline Whitefoord and Ynes Mexia) in the list of our top 50 plant collectors and they are not close to the most prolific collectors.

We identified more women in the rest of our records, but their contribution is on average less than 25 specimens per person in the dataset consisting of more than 10,000 specimens. In contrast, the top five male collectors contributed 10% of our collection. 

they continued

Releasing Rosewoods

Both the Pterocarpus and Dalbergia genera include species that are used as expensive good quality timber that is prone to illegal logging. Many species such as Pterocarpus tinctorius are also listed on the International Union for Conservation of Nature (IUCN) Red List of Threatened Species. By releasing this new resource of information on all these plants from three of the biggest herbaria in the world, we can share this datа with the people who are taking care of biodiversity in these countries. The data can be used to identify hotspots, where the tree is naturally growing and protect these areas. These data would also allow much closer attention to be paid to areas that could be targets for illegal logging activity.

Pterocarpus tinctorius is a species of padauk tree that is listed as endangered on the IUCN Red List.
Cowpea (Vigna unguiculata) is a food and animal feed crop grown in the semi-arid tropics.

The ODA-listed countries are economically impoverished and disproportionately prone to be disadvantaged with the changing climate whether from flood or drought or increase in temperature.

Using data to identify good, nutritious plant species that can be grown in such conditions can therefore benefit local communities, potentially reducing dependence on imports, aid and on less resilient crops. 

the team adds in conclusion.

***

This dataset is now openly available on the Museum’s Data Portal and a data paper about this work has been released in the Biodiversity Data Journal.

***

Stay in touch with the Digitisation team by following us on Instagram and Twitter

Don’t forget to also follow the Biodiversity Data Journal on Twitter and Facebook.

New species of owl discovered in the rainforests of Príncipe Island, Central Africa 

The Principe Scops-Owl, the eighth known bird species endemic to the island, has a unique call and lives in a restricted range in the Príncipe Obô Natural Park.

A new species of owl has just been described from Príncipe Island, part of the Democratic Republic of São Tomé and Príncipe in Central Africa. Scientists were first able to confirm its presence in 2016, although suspicions of its occurrence gained traction from 1998, and testimonies from local people suggesting its existence could be traced back as far as 1928. 

Otus bikegila. Photo by Martim Melo

The new owl species was described in the open-access journal ZooKeys based on multiple lines of evidence such as morphology, plumage colour and pattern, vocalisations, and genetics. Data was gathered and processed by an international team led by Martim Melo (CIBIO and Natural History and Science Museum of the University of Porto), Bárbara Freitas (CIBIO and the Spanish National Museum of Natural Sciences) and Angelica Crottini (CIBIO).

Bárbara Freitas, Bikegila and Martim Melo pose with an owl. Photo by Martim Melo

The bird is now officially known as the Principe Scops-Owl, or Otus bikegila.

Otus” is the generic name given to a group of small owls sharing a common history, commonly called scops-owls. They are found across Eurasia and Africa and include such widespread species as the Eurasian Scops-Owl (Otus scops) and the African Scops-Owl (Otus senegalensis). 

Bikegila. Photo by Martim Melo

The scientists behind the discovery further explain that the species epithet “bikegila” was chosen in homage of Ceciliano do Bom Jesus, nicknamed Bikegila – a former parrot harvester from Príncipe Island and now a ranger of its natural park. 

“The discovery of the Principe Scops-Owl was only possible thanks to the local knowledge shared by Bikegila and by his unflinching efforts to solve this long-time mystery,” the researchers say. “As such, the name is also meant as an acknowledgment to all locally-based field assistants who are crucial in advancing the knowledge on the biodiversity of the world.”

Martim Melo and Bikegila. Photo by Alexandre Vaz

In the wild, the easiest way to recognise one would be its unique call – in fact, it was one of the main clues leading to its discovery. 

Otus bikegila‘s unique call is a short “tuu” note repeated at a fast rate of about one note per second, reminiscent of insect calls. It is often emitted in duets, almost as soon as the night has fallen,” Martim Melo explains.

Otus bikegila’s call. Recording by Martim Melo

The entire Principe Island was extensively surveyed to determine the distribution and population size of the new species. Results, published in the journal Bird Conservation International, show that the Principe Scops-Owl is found only in the remaining old-growth native forest of Príncipe in the uninhabited southern part of the island. There, it occupies an area of about 15 km2, apparently due to a preference for lower elevations. In this small area (about four times the size of Central Park), the densities of the owl are relatively high, with the population estimated at around 1000-1500 individuals.

The difficult terrain of the uninhabited southern forests of Príncipe Island, home to the Príncipe Scops-Owl, was somewhat immortalised by José Correia, Portuguese collector for the American Museum of Natural History, when collecting there in 1928. He wrote in his diary: “I have been in very bad fields ready, but this is bad among the bad or worse among the worse”. Photo by Alexandre Vaz

Nevertheless, because all individuals of the species occur in this single and very small location (of which a part will be affected in the near future by the construction of a small hydro-electric dam), researchers have proposed that the species should be classified as ‘Critically Endangered’, the highest threat level on the IUCN Red List. This recommendation must still be evaluated by the International Union for Conservation of Nature.

Otus Bikegila. Photo by Martim Melo

Monitoring the population will be essential to get more precise estimates of its size and follow its trends. For this purpose, a survey protocol relying on the deployment of automatic recording units and AI to retrieve the data from these has been designed and successfully tested.

“The discovery of a new species that is immediately evaluated as highly threatened illustrates well the current biodiversity predicament”, the researchers say. “On a positive note, the area of occurrence of the Principe Scops-Owl is fully included within the Príncipe Obô Natural Park, which will hopefully help secure its protection.”

A view of the owl’s habitat. Photo by Martim Melo

This is the eighth known species of bird endemic to Príncipe, further highlighting the unusually high level of bird endemism for this island of only 139 km2.

Otus Bikegila. Photo by Paul van Giersbergen

Even though a new species of scops-owl was just described from Príncipe, genetic data indicated that the island was, surprisingly, likely the first in the Gulf of Guinea to be colonised by a species of scops-owl.

“Although it may seem odd for a bird species to remain undiscovered for science for so long on such a small island, this is by no means an isolated case when it comes to owls,” the researchers state. “For example, the Anjouan Scops-Owl was rediscovered in 1992, 106 years after its last observation, on Anjouan Island (also known as Ndzuani) in the Comoro Archipelago, and the Flores Scops-Owl was rediscovered in 1994, 98 years after the previous report.”

 “The discovery of a new bird species is always an occasion to celebrate and an opportunity to reach out to the general public on the subject of biodiversity,” says Martim Melo. “In this age of human-driven extinction, a major global effort should be undertaken to document what may soon not be anymore,” he and his team state in their paper.

Otus bikegila. Photo by Philippe Verbelen

“Birds are likely the best studied animal group. As such, the discovery of a new bird species in the 21st century underscores both the actuality of field-based explorations aiming at describing biodiversity, and how such curiosity-driven endeavour is more likely to succeed when coupled with local ecological knowledge, the participation of keen amateur naturalists, and persistence,” they add.

They believe that this “new wave of exploration, carried out by professionals and amateurs alike”, will help rekindle the link to the natural world, which will be essential to help revert the global biodiversity crisis.

Research article:

Melo M, Freitas B, Verbelen P, da Costa SR, Pereira H, Fuchs J, Sangster G, Correia MN, de Lima RF, Crottini A (2022) A new species of scops-owl (Aves, Strigiformes, Strigidae, Otus) from Príncipe Island (Gulf of Guinea, Africa) and novel insights into the systematic affinities within Otus. ZooKeys 1126: 1-54. https://doi.org/10.3897/zookeys.1126.87635

#TDWG2022 recap: TDWG and Pensoft welcomed 400 biodiversity information experts from 41 countries in Sofia

For the 37th time, experts from across the world to share and discuss the latest developments surrounding biodiversity data and how they are being gathered, used, shared and integrated across time, space and disciplines.

Between 17th and 21st October, about 400 scientists and experts took part in a hybrid meeting dedicated to the development, use and maintenance of biodiversity data, technologies, and standards across the world.

This year, the conference was hosted by Pensoft in collaboration with the National Museum of Natural History (Bulgaria) and the Institute of Biodiversity and Ecosystem Research at the Bulgarian Academy of Science. It ran under the theme “Stronger Together: Standards for linking biodiversity data”.

For the 37th time, the global scientific and educational association Biodiversity Information Standards (TDWG) brought together experts from all over the globe to share and discuss the latest developments surrounding biodiversity data and how they are being gathered, used, shared and integrated across time, space and disciplines.

This was the first time the event happened in a hybrid format. It was attended by 160 people on-site, while another 235 people joined online. 

The TDWG 2022 conference saw plenty of networking and engaging discussions with as many as 160 on-site attendees and another 235 people, who joined the event remotely.

The conference abstracts, submitted by the event’s speakers ahead of the meeting, provide a sneak peek into their presentations and are all publicly available in the TDWG journal Biodiversity Information Science and Standards (BISS).

“It’s wonderful to be in the Balkans and Bulgaria for our Biodiversity Information and Standards (TDWG) 2022 conference! Everyone’s been so welcoming and thoughtfully engaged in conversations about biodiversity information and how we can all collaborate, contribute and benefit,”

said Deborah Paul, Chair of TDWG, a biodiversity informatics specialist and community liaison at the University of Illinois, Prairie Research Institute‘s Illinois Natural History Survey and also an active participant in the Society for the Preservation of Natural History Collections (SPNHC), the Entomological Collections Network (ECN), ICEDIG, the Research Data Alliance (RDA), and The Carpentries.

“Our TDWG mission is to create, maintain and promote the use of open, community-driven standards to enable sharing and use of biodiversity data for all,”

she added.
Prof Lyubomir Penev (Pensoft) and Deborah Paul (TDWG) at TDWG 2022.

“We are proud to have been selected to be the hosts of this year’s TDWG annual conference and are definitely happy to have joined and observed so many active experts network and share their know-how and future plans with each other, so that they can collaborate and make further progress in the way scientists and informaticians work with biodiversity information,”  

said Pensoft’s founder and CEO Prof. Lyubomir Penev.

“As a publisher of multiple globally renowned scientific journals and books in the field of biodiversity and ecology, at Pensoft we assume it to be our responsibility to be amongst the first to implement those standards and good practices, and serve as an example in the scholarly publishing world. Let me remind you that it is the scientific publications that present the most reliable knowledge the world and science has, due to the scrutiny and rigour in the review process they undergo before seeing the light of day,”

he added.

***

In a nutshell, the main task and dedication of the TDWG association is to develop and maintain standards and data-sharing protocols that support the infrastructures (e.g., The Global Biodiversity Information Facility – GBIF), which aggregate and facilitate use of these data, in order to inform and expand humanity’s knowledge about life on Earth.

It is the goal of everyone at TDWG to let scientists interested in the world’s biodiversity to do their work efficiently and in a manner that can be understood, shared and reused.

It is the goal of everyone volunteering their time and expertise to TDWG to enable the scientists interested in the world’s biodiversity to do their work efficiently and in a manner that can be understood, shared and reused by others. After all, biodiversity data underlie everything we know about the natural world.

If there are optimised and universal standards in the way researchers store and disseminate biodiversity data, all those biodiversity scientists will be able to find, access and use the knowledge in their own work much more easily. As a result, they will be much better positioned to contribute new knowledge that will later be used in nature and ecosystem conservation by key decision-makers.

On Monday, the event opened with welcoming speeches by Deborah Paul and Prof. Lyubomir Penev in their roles of the Chair of TDWG and the main host of this year’s conference, respectively.

The opening ceremony continued with a keynote speech by Prof. Pavel Stoev, Director of the Natural History Museum of Sofia and co-host of TDWG 2022. 

Prof. Pavel Stoev (Natural History Museum of Sofia) with a presentation about the known and unknown biodiversity of Bulgaria during the opening plenary session of TDWG 2022.

He walked the participants through the fascinating biodiversity of Bulgaria, but also the worrying trends in the country associated with declining taxonomic expertise. 

He finished his talk with a beam of hope by sharing about the recently established national unit of DiSSCo, whose aim – even if a tad too optimistic – is to digitise one million natural history items in four years, of which 250,000 with photographs. So far, one year into the project, the Bulgarian team has managed to digitise more than 32,000 specimens and provide images to 10,000 specimens.

The plenary session concluded with a keynote presentation by renowned ichthyologist and biodiversity data manager Dr. Richard L. Pyle, who is also a manager of ZooBank – the key international database for newly described species.

Keynote presentation by Dr Richard L. Pyle (Bishop Museum, USA) at the opening plenary session of TDWG 2022.

In his talk, he highlighted the gaps in the ways taxonomy is being used, thereby impeding biodiversity research and cutting off a lot of opportunities for timely scientific progress.

“There are simple things we can do to change how we use taxonomy as a tool that would dramatically improve our ability to conduct science and understand biodiversity. There is enormous value and utility within existing databases around the world to understand biodiversity, how threatened it is, what impacts human activity has (especially climate change), and how to optimise the protection and preservation of biodiversity,”

he said in an interview for a joint interview by the Bulgarian News Agency and Pensoft.

“But we do not have easy access to much of this information because the different databases are not well integrated. Taxonomy offers us the best opportunity to connect this information together, to answer important questions about biodiversity that we have never been able to answer before. The reason meetings like this are so important is that they bring people together to discuss ways of using modern informatics to greatly increase the power of the data we already have, and prioritise how we fill the gaps in data that exist. Taxonomy, and especially taxonomic data integration, is a very important part of the solution.”

Pyle also commented on the work in progress at ZooBank ten years into the platform’s existence and its role in the next (fifth) edition of the International Code of Zoological Nomenclature, which is currently being developed by the International Commission of Zoological Nomenclature (ICZN). 

“We already know that ZooBank will play a more important role in the next edition of the Code than it has for these past ten years, so this is exactly the right time to be planning new services for ZooBank. Improvements at ZooBank will include things like better user-interfaces on the web to make it easier and faster to use ZooBank, better data services to make it easier for publishers to add content to ZooBank as part of their publication workflow, additional information about nomenclature and taxonomy that will both support the next edition of the Code, and also help taxonomists get their jobs done more efficiently and effectively. Conferences like the TDWG one are critical for helping to define what the next version of ZooBank will look like, and what it will do.”

***

During the week, the conference participants had the opportunity to enjoy a total of 140 presentations; as well as multiple social activities, including a field trip to Rila Monastery and a traditional Bulgarian dinner.

TDWG 2022 conference participants document their species observations on their way to Rila Monastery.

While going about the conference venue and field trip localities, the attendees were also actively uploading their species observations made during their stay in Bulgaria on iNaturalist in a TDWG2022-dedicated BioBlitz. The challenge concluded with a total of 635 observations and 228 successfully identified species.

Amongst the social activities going on during TDWG 2022 was a BioBlitz, where the conference participants could uploade their observations made in Bulgaria on iNaturalist and help each other successfully identify the specimens.

***

In his interview for the Bulgarian News Agency and Pensoft, Dr Vincent Smith, Head of the Informatics Division at the Natural History Museum, London (United Kingdom), co-founder of DiSSCo, the Distributed System of Scientific Collections, and the Editor-in-Chief of Biodiversity Data Journal, commented: 

“Biodiversity provides the support systems for all life on Earth. Yet the natural world is in peril, and we face biodiversity and climate emergencies. The consequences of these include accelerating extinction, increased risk from zoonotic disease, degradation of natural capital, loss of sustainable livelihoods in many of the poorest yet most biodiverse countries of the world, challenges with food security, water scarcity and natural disasters, and the associated challenges of mass migration and social conflicts.

Solutions to these problems can be found in the data associated with natural science collections. DiSSCo is a partnership of the institutions that digitise their collections to harness their potential. By bringing them together in a distributed, interoperable research infrastructure, we are making them physically and digitally open, accessible, and usable for all forms of research and innovation. 

At present rates, digitising all of the UK collection – which holds more than 130 million specimens collected from across the globe and is being taken care of by over 90 institutions – is likely to take many decades, but new technologies like machine learning and computer vision are dramatically reducing the time it will take, and we are presently exploring how robotics can be applied to accelerate our work.”

Dr Vincent Smith, Head of the Informatics Division at the Natural History Museum, London, co-founder of DiSSCo, and Editor-in-Chief of Biodiversity Data Journal at the TDWG 2022 conference.

In his turn, Dr Donat Agosti, CEO and Managing director at Plazi – a not-for-profit organisation supporting and promoting the development of persistent and openly accessible digital taxonomic literature – said:

“All the data about biodiversity is in our libraries, that include over 500 million pages, and everyday new publications are being added. No person can read all this, but machines allow us to mine this huge, very rich source of data. We do not know how many species we know, because we cannot analyse with all the scientists in this library, nor can we follow new publications. Thus, we do not have the best possible information to explore and protect our biological environment.”

Dr Donat Agosti demonstrating the importance of publishing biodiversity data in a structured and semantically enhanced format in one of his presentations at TDWG 2022.

***

At the closing plenary session, Gail Kampmeier – TDWG Executive member and one of the first zoologists to join TDWG in 1996 – joined via Zoom to walk the conference attendees through the 37-year history of the association, originally named the Taxonomic Databases Working Group, but later transformed to Biodiversity Information Standards, as it expanded its activities to the whole range of biodiversity data. 

“While this presentation is about TDWG’s history as an organisation, its focus will be on the heart of TDWG: its people. We would like to show how the organisation has evolved in terms of gender balance, inclusivity actions, and our engagement to promote and enhance diversity at all levels. But more importantly, where do we—as a community—want to go in the future?”,

reads the conference abstract of her colleague at TDWG Dr Visotheary Ung (CNRS-MNHN) and herself.

Then, in the final talk of the session, Deborah Paul took to the stage to present the progress and key achievements by the association from 2022.

She gave a special shout-out to the TDWG journal: Biodiversity Information Science and Standards (BISS), where for the 6th consecutive year, the participants of the annual conference submitted and published their conference abstracts ahead of the event. 

Deborah Paul reminds that – apart from the conference abstracts – the TDWG journal: Biodiversity Information Science and Standards (BISS) also welcomes full-lenght articles that demonstrate the development or application of new methods and approaches in biodiversity informatics.

Launched in 2017 on the Pensoft’s publishing platform ARPHA, the journal provides the quite unique and innovative opportunity to have both abstracts and full-length research papers published in a modern, technologically-advanced scholarly journal. In her speech, Deborah Paul reminded that BISS journal welcomes research articles that demonstrate the development or application of new methods and approaches in biodiversity informatics in the form of case studies.

Amongst the achievements of TDWG and its community, a special place was reserved for the Horizon 2020-funded BiCIKL project (abbreviation for Biodiversity Community Integrated Knowledge Library), involving many of the association’s members. 

Having started in 2021, the 3-year project, coordinated by Pensoft, brings together 14 partnering institutions from 10 countries, and 15 biodiversity under the common goal to create a centralised place to connect all key biodiversity data by interlinking a total of 15 research infrastructures and their databases.

Deborah Paul also reported on the progress of the Horizon 2020-funded project BiCIKL, which involves many of the TDWG members. BiCIKL’s goal is to create a centralised place to connect all key biodiversity data by interlinking 15 key research infrastructures and their databases.

In fact, following the week-long TDWG 2022 conference in Sofia, a good many of the participants set off straight for another Bulgarian city and another event hosted by Pensoft. The Second General Assembly of BiCIKL took place between 22nd and 24th October in Plovdiv.

***

You can also explore highlights and live tweets from TDWG 2022 on Twitter via #TDWG2022.
The Pensoft team at TDWG 2022 were happy to become the hosts of the 37th TDWG conference.

‘Who is in your database and why does it matter?’

The uncertainty about a person’s identity hampers research, hinders the discovery of expertise, and obstructs the ability to give attribution or credit for work performed. 

Collection discovery through disambiguation

Guest blog post by Sabine von Mering, Heather Rogers, Siobhan Leachman, David P. ShorthouseDeborah Paul & Quentin Groom

Worldwide, natural history institutions house billions of physical objects in their collections, they create and maintain data about these items, and they share their data with aggregators such as the Global Biodiversity Information Facility (GBIF), the Integrated Digitized Biocollections (iDigBio), the Atlas of Living Australia (ALA), Genbank and the European Nucleotide Archive (ENA). 

Even though these data often include the names of the people who collected or identified each object, such statements may be ambiguous, as the names frequently lack any globally unique, machine-readable concept of their shared identity.

Despite the data being available online, barriers exist to effectively use the information about who collects or provides the expertise to identify the collection objects. People have similar names, change their name over the course of their lifetime (e.g. through marriage), or there may be variability introduced through the label transcription process itself (e.g. local look-up lists). 

As a result, researchers and collections staff often spend a lot of time deducing who is the person or people behind unknown collector strings while collating or tidying natural history data. The uncertainty about a person’s identity hampers research, hinders the discovery of expertise, and obstructs the ability to give attribution or credit for work performed. 

Disambiguation activities: the act of churning strings into verifiable things using all available evidence – need not be done in isolation. In addition to presenting a workflow on how to disambiguate people in collections, we also make the case that working in collaboration with colleagues and the general public presents new opportunities and introduces new efficiencies. There is tacit knowledge everywhere.

More often than not, data about people involved in biodiversity research are scattered across different digital platforms. However, with linking information sources to each other by using person identifiers, we can better trace the connections in these networks, so that we can weave a more interoperable narrative about every actor.

That said, inconsistent naming conventions or lack of adequate accreditation often frustrate the realization of this vision. This sliver of natural history could be churned to gold with modest improvements in long-term funding for human resources, adjustments to digital infrastructure, space for the physical objects themselves alongside their associated documents, and sufficient training on how to disambiguate people’s names.

“He aha te mea nui o te ao. He tāngata, he tāngata, he tāngata.

“What is the most important thing in the world? It is people, it is people, it is people.”

(Māori proverb)

The process of properly disambiguating those who have contributed to natural history collections takes time. 

The disambiguation process involves the extra challenge of trying to deduce “who is who” for legacy data, compared to undertaking this activity for people alive today. Retrospective disambiguation can require considerable detective work, especially for scarcely known people or if the community has a different naming convention. Provided the results of this effort are well-communicated and openly shared, mercifully, it need only be done once.

At the core of our research is the question of how to solve the issue of assigning proper credit

In our recent Methods paper, we discuss several methods for this, as well as available routes for making records available online that include not only the names of people expressed as text, but additionally twinned with their unique, resolvable identifiers. 

Disambiguation is a cycle. Enrichment of the data feeds off itself leading to further disambiguation. As more names are disambiguated and more biographical data are accumulated, it becomes easier to disambiguate more names. 

First and foremost, we should maintain our own public biographical data by making full use of ORCID. In addition to preserving our own scientific legacy and that of the institutions that employ us, we have a responsibility to avoid generating unnecessary disambiguation work for others. 

For legacy data, where the people connected to the collections are deceased, Wikidata can be used to openly document rich bibliographic and demographic data, each statement with one or more verifiable references. Wikidata can also act as a bridge to link other sources of authority such as VIAF or ORCID identifiers. It has many tools and services to bulk import, export, and to query information, making it well-suited as a universal democratiser of information about people often walled-off in collection management systems (CMS). 

A network of the top twenty most used identifiers for biologists on Wikidata.

Once unique identifiers for people are integrated in collection management systems, these may be shared with the global collections and research community using the new Darwin Core terms, recordedByID or identifiedByID along with the well-known, yet text-based terms, recordedBy or identifiedBy. 

Approximately 120 datasets published through GBIF now make use of these identifier-based terms, which are additionally resolved in Bionomia every few weeks alongside co-curated attributions newly made there. This roundtrip of data – emerging as ambiguous strings of text from the source, affixed with resolvable identifiers elsewhere, absorbed into the source as new digital annotations, and then re-emerging with these fresh, identifier-based enhancements – is an exciting approach to co-manage collections data.

Round tripping. In Bionomia, people identifiers from Wikidata and ORCID are used to enrich data published via GBIF, thus linking natural history specimens to the world’s collectors.

Disambiguation work is particularly important in recognising contributors who have been historically marginalized. For example, gender bias in specimen data can be seen in the case of Wilmatte Porter Cockerell, a prolific collector of botanical, entomological and fossil specimens. Cockerell’s collections are often attributed to her husband as he was also a prolific collector and the two frequently collected together. 

On some labels, her identity is further obscured as she is simply recorded as “& wife” (see example on GBIF). Since Wilmatte Cockerell was her husband’s second wife, it can take some effort to confirm if a specimen can be attributed to her and not her husband’s first wife, who was also involved in collecting specimens. By ensuring that Cockerell is disambiguated and her contributions are appropriately attributed, the impact of her work becomes more visible enabling her work to be properly and fairly credited.

Thus, disambiguation work helps to not only give credit where credit is due, thereby making data about people and their biodiversity collections more findable, but it also creates an inclusive and representative narrative of the landscape of people involved with scientific knowledge creation, identification, and preservation. 

A future – once thought to be a dream – where the complete scientific output of a person is connected as Linked Open Data (LOD) is now

Both the tools and infrastructure are at our disposal and the demand is palpable. All institutions can contribute to this movement by sharing data that include unique identifiers for the people in their collections. We recommend that institutions develop a strategy, perhaps starting with employees and curatorial staff, people of local significance, or those who have been marginalized, and to additionally capitalize on existing disambiguation activities elsewhere. This will have local utility and will make a significant, long-term impact. 

The more we participate in these activities, the greater chance we will uncover positive feedback loops, which will act to lighten the workload for all involved, including our future selves!

The disambiguation of people in collections is an ongoing process, but it becomes easier with practice. We also encourage collections staff to consider modifying their existing workflows and policies to include identifiers for people at the outset, when new data are generated or when new specimens are acquired. 

There is more work required at the global level to define, update, and ratify standards and best practices to help accelerate data exchange or roundtrips of this information; there is room for all contributions. Thankfully, there is a diverse, welcoming, energetic, and international community involved in these activities. 

We see a bright future for you, our collections, and our research products – well within reach – when the identities of people play a pivotal role in the construction of a knowledge graph of life.

You would like to participate and need support getting disambiguation of your collection started? Please contact our TDWG People in Biodiversity Data Task Group.

A good start is also to check Bionomia to find out what metrics exist now for your institution or collection and affiliated people.

The next steps for collections: 7 objectives that can help to disambiguate your institutions’ collection:

1. Promote the use of person identifiers in local, national or international outreach, publishing and research activities

2. Increase the number of collection management systems that use person identifiers

3. Increase the number of living collectors registered and using an ORCID identifier when contributing to collections

4. Undertake disambiguation in the national languages of many countries

5. Increase the number of identified people on Wikidata linked to collections

6. Increase the number of people in collections with expertise in person disambiguation

7. Collaborate towards an exchange standard for attribution data

A real example of how a name string is disambiguated and the steps taken in documenting it. Wikidata item of Jean-André Soulié

***

Methods publication:

Groom Q, Bräuchler C, Cubey RWN, Dillen M, Huybrechts P, Kearney N, Klazenga N, Leachman S, Paul DL, Rogers H, Santos J, Shorthouse DP, Vaughan A, von Mering S, Haston EM (2022) The disambiguation of people names in biological collections. Biodiversity Data Journal 10: e86089. https://doi.org/10.3897/BDJ.10.e86089

***

Follow Biodiversity Data Journal on Twitter and Facebook.

High-schoolers join scholars to lift the lid on Hong Kong’s soil biodiversity

Most often, the students would find millipedes. They even helped identify two species that are new to Hong Kong’s fauna.

Soil and its macrofauna are an integral part of many ecosystems, playing an important role in decomposition and nutrient recycling. However, soil biodiversity remains understudied globally.

To help fill this gap and reveal the diversity of soil fauna in Hong Kong, a team of scientists from The Chinese University of Hong Kong initiated a citizen science project involving universities, non-governmental organisations and secondary school students and teachers.

“Involving citizens as part of the new knowledge generation process is important in promoting the understanding of biodiversity. Training younger-generation citizens to learn about biodiversity is of utmost importance and crucial to conservation engagement”

– say the researchers in their study, which was published in the open-access Biodiversity Data Journal.

The soil sampling methodology that the students employed in this study.
Video by Sheung Yee Lai, Ka Wai Ting, Tze Kiu Chong and Wai Lok So.

Working side by side with university academics, taxonomists and non-governmental organisation members, students from 21 schools/institutes were recruited to collect soil animals near their campusesfor a year and record their observations.

Between October 2019 and October 2020, they monitored and sampled species across 21 sites of urban and semi-natural habitats in Hong Kong, collecting a total of 3,588 individual samples. Their efforts yielded 150 soil macrofaunal species, identified as arthropods (including insects, spiders, centipedes and millipedes), worms, and snails.

Most often, the students found millipedes (23 out of 150 species). They even helped identify two millipede species that are new to Hong Kong’s fauna: Monographis queenslandica and Alloproctoides remyi. The former is usually found in Australia – the researchers suggest it might have been introduced to the area many decades ago from Queensland or vice versa – and the latter has been observed in Reunion and Mauritius.

Two polyxenid millipede species, collected in this study, turned out to had never before been recorded from Hong Kong.
Left: Monographis queenslandica and Alloproctoides remyi (right).
Image by Sheung Yee Lai, Ka Wai Ting and Wai Lok So.

Millipedes like these two species can accelerate litter decomposition and regulate the soil carbon and phosphorus cycling, while earthworms can modify the soil structure and regulate water and organic matter cycling.

“Before the beginning of this project, the understanding of soil biodiversity in Hong Kong, including the understanding of its contained millipede species, was inadequate”

the researchers write in their paper.

Now, they believe that the identified macrofauna species and their 646 DNA barcodes have established a solid foundation for further research in soil biodiversity in the area.

Their project also serves an additional purpose. Unlike most conventional scientific studies, which are usually carried out by the government, non-governmental organisations or academics in universities alone, this study utilised a citizen science approach through creating a big community engaged with biodiversity. In doing so, it helped educate the public and raise awareness on the use of basic science techniques in understanding local biodiversity.

So, it may have inspired a new generation of future scientists: some students started millipede cultures in their own schools, and one school used the millipede breeding model to participate in a science and technology competition.

This study is a proof that local institutes and high schools can unite together with research teams at universities and perform scientific work, the study’s authors believe.

It “has raised public awareness and potentially opens up opportunities for the general public to engage in scientific research in the future.” 

The team hopes that their approach could inspire future biodiversity sampling and monitoring studies to engage more citizen scientists.

***

Research article:

So WL, Ting KW, Lai SY, Huang EYY, Ma Y, Chong TK, Yip HY, Lee HT, Cheung BCT, Chan MK, Consortium HKSB, Nong W, Law MMS, Lai DYF, Hui JHL (2022) Revealing the millipede and other soil-macrofaunal biodiversity in Hong Kong using a citizen science approach. Biodiversity Data Journal 10: e82518. https://doi.org/10.3897/BDJ.10.e82518

***

Follow Biodiversity Data Journal on Twitter and Facebook.