BiCIKL keeps on adding project outcomes in own collection in RIO Journal

The publications so far include the grant proposal; conference abstracts, a workshop report, guidelines papers and deliverables submitted to the Commission.

The dynamic open-science project collection of BiCIKL, titled “Towards interlinked FAIR biodiversity knowledge: The BiCIKL perspective” (doi: 10.3897/rio.coll.105), continues to grow, as the project progresses into its third year and its results accumulate ever so exponentially. 

Following the publication of three important BiCIKL deliverables: the project’s Data Management Plan, its Visual identity package and a report, describing the newly built workflow and tools for data extraction, conversion and indexing and the user applications from OpenBiodiv, there are currently 30 research outcomes in the BiCIKL collection that have been shared publicly to the world, rather than merely submitted to the European Commission.

Shortly after the BiCIKL project started in 2021, a project-branded collection was launched in the open-science scholarly journal Research Ideas and Outcomes (RIO). There, the partners have been publishing – and thus preserving – conclusive research papers, as well as early and interim scientific outputs.

The publications so far also include the BiCIKL grant proposal, which earned the support of the European Commission in 2021; conference abstracts, submitted by the partners to two consecutive TDWG conferences; a project report that summarises recommendations on interoperability among infrastructures, as concluded from a hackathon organised by BiCIKL; and two Guidelines papers, aiming to trigger a culture change in the way data is shared, used and reused in the biodiversity field. 

In fact, one of the Guidelines papers, where representatives of the Consortium of European Taxonomic Facilities (CETAF), the Society for the Preservation of Natural History Collections (SPNHC) and the Biodiversity Heritage Library (BHL) came together to publish their joint statement on best practices for the citation of authorities of scientific names, has so far generated about 4,000 views by nearly 3,000 unique readers.

At the time of writing, the top three of the most read papers in the BiCIKL collection is completed by the grant proposal and the second Guidelines paper, where the partners – based on their extensive and versatile experience – present recommendations about the use of annotations and persistent identifiers in taxonomy and biodiversity publishing. 

Access to data and services along the entire data and research life cycle in biodiversity science.
The figure was featured in the BiCIKL grant proposal, now made available from the BiCIKL project collection in RIO Journal.

What one might find quite odd when browsing the BiCIKL collection is that each publication is marked with its own publication source, even though all contributions are clearly already accessible from RIO Journal

So, we can see many project outputs marked as RIO publications, but also others that have been published in the likes of F1000Research, the official journal of TDWG: Biodiversity Information Science and Standards, and even preprints servers, such as BiohackrXiv

This is because one of the unique features of RIO allows for consortia to use their project collection as a one-stop access point for all scientific results, regardless of their publication venue, by means of linking to the original source via metadata. Additionally, projects may also upload their documents in their original format and layout, thanks to the integration between RIO and ARPHA Preprints. This is in fact how BiCIKL chose to share their latest deliverables using the very same files they submitted to the Commission.

“In line with the mission of BiCIKL and our consortium’s dedication to FAIRness in science, we wanted to keep our project’s progress and results fully transparent and easily accessible and reusable to anyone, anywhere,” 

explains Prof Lyubomir Penev, BiCIKL’s Project Coordinator and founder and CEO of Pensoft. 

“This is why we opted to collate the outcomes of BiCIKL in one place – starting from the grant proposal itself, and then progressively adding workshop reports, recommendations, research papers and what not. By the time BiCIKL concludes, not only will we be ready to refer back to any step along the way that we have just walked together, but also rest assured that what we have achieved and learnt remains at the fingertips of those we have done it for and those who come after them,” he adds.

***

You can keep tabs on the BiCIKL project collection in RIO Journal by subscribing to the journal newsletter or following @RIOJournal on Twitter and Facebook.

Interoperable biodiversity data extracted from literature through open-ended queries

OpenBiodiv is a biodiversity database containing knowledge extracted from scientific literature, built as an Open Biodiversity Knowledge Management System. 

The OpenBiodiv contribution to BiCIKL

Apart from coordinating the Horizon 2020-funded project BiCIKL, scholarly publisher and technology provider Pensoft has been the engine behind what is likely to be the first production-stage semantic system to run on top of a reasonably-sized biodiversity knowledge graph.

OpenBiodiv is a biodiversity database containing knowledge extracted from scientific literature, built as an Open Biodiversity Knowledge Management System. 

As of February 2023, OpenBiodiv contains 36,308 processed articles; 69,596 taxon treatments; 1,131 institutions; 460,475 taxon names; 87,876 sequences; 247,023 bibliographic references; 341,594 author names; and 2,770,357 article sections and subsections.

In fact, OpenBiodiv is a whole ecosystem comprising tools and services that enable biodiversity data to be extracted from the text of biodiversity articles published in data-minable XML format, as in the journals published by Pensoft (e.g. ZooKeys, PhytoKeys, MycoKeys, Biodiversity Data Journal), and other taxonomic treatments – available from Plazi and Plazi’s specialised extraction workflow – into Linked Open Data.

“I believe that OpenBiodiv is a good real-life example of how the outputs and efforts of a research project may and should outlive the duration of the project itself. Something that is – of course – central to our mission at BiCIKL.”

explains Prof Lyubomir Penev, BiCIKL’s Project Coordinator and founder and CEO of Pensoft.

“The basics of what was to become the OpenBiodiv database began to come together back in 2015 within the EU-funded BIG4 PhD project of Victor Senderov, later succeeded by another PhD project by Mariya Dimitrova within IGNITE. It was during those two projects that the backend Ontology-O, the first versions of RDF converters and the basic website functionalities were created,”

he adds.

At the time OpenBiodiv became one of the nine research infrastructures within BiCIKL tasked with the provision of virtual access to open FAIR data, tools and services, it had already evolved into a RDF-based biodiversity knowledge graph, equipped with a fully automated extraction and indexing workflow and user apps.

Currently, Pensoft is working at full speed on new user apps in OpenBiodiv, as the team is continuously bringing into play invaluable feedback and recommendation from end-users and partners at BiCIKL. 

As a result, OpenBiodiv is already capable of answering open-ended queries based on the available data. To do this, OpenBiodiv discovers ‘hidden’ links between data classes, i.e. taxon names, taxon treatments, specimens, sequences, persons/authors and collections/institutions. 

Thus, the system generates new knowledge about taxa, scientific articles and their subsections, the examined materials and their metadata, localities and sequences, amongst others. Additionally, it is able to return information with a relevant visual representation about any one or a combination of those major data classes within a certain scope and semantic context.

Users can explore the database by either typing in any term (even if misspelt!) in the search engine available from the OpenBiodiv homepage; or integrating an Application Programming Interface (API); as well as by using SPARQL queries.

On the OpenBiodiv website, there is also a list of predefined SPARQL queries, which is continuously being expanded.

Sample of predefined SPARQL queries at OpenBiodiv.

“OpenBiodiv is an ambitious project of ours, and it’s surely one close to Pensoft’s heart, given our decades-long dedication to biodiversity science and knowledge sharing. Our previous fruitful partnerships with Plazi, BIG4 and IGNITE, as well as the current exciting and inspirational network of BiCIKL are wonderful examples of how far we can go with the right collaborators,”

concludes Prof Lyubomir Penev.

***

Follow BiCIKL on Twitter and Facebook. Join the conversation on Twitter at #BiCIKL_H2020.

You can also follow Pensoft on Twitter, Facebook and Linkedin and use #OpenBiodiv on Twitter.

Experts in insect taxonomy “threatened by extinction” reveals the first European Red List of Taxonomists

While insect populations continue to decline, taxonomic expertise in Europe is at serious risk, confirms data obtained within the European Red List of Insect Taxonomists, a recent study commissioned by the European Union. 

Expertise tends to be particularly poor in the countries with the richest biodiversity, while taxonomists are predominantly male and ageing

While insect populations continue to decline, taxonomic expertise in Europe is at serious risk, confirms data obtained within the European Red List of Insect Taxonomists, a recent study commissioned by the European Union. 

Scientists who specialise in the identification and discovery of insect species – also known as insect taxonomists – are declining across Europe, highlights the newly released report by CETAF, International Union for Conservation of Nature (IUCN) and Pensoft. The authors of this report represent different perspectives within biodiversity science, including natural history and research institutions, nature conservation, academia and scientific publishing.

Despite the global significance of its taxonomic collections, Europe has been losing taxonomic expertise at such a rate that, at the moment nearly half (41.4%) of the insect orders are not covered by a sufficient number of scientists. If only EU countries are counted, the number looks only slightly more positive (34.5%). Even the four largest insect orders: beetles (Coleoptera), moths and butterflies (Lepidoptera), flies (Diptera) and wasps, bees, ants and sawflies (Hymenoptera) are only adequately ‘covered’ in a fraction of the countries.

To obtain details about the number, location and productivity of insect taxonomists, the team extracted information from thousands of peer-reviewed research articles published in the last decade, queried the most important scientific databases and reached out to over fifty natural science institutions and their networks. Furthermore, a dedicated campaign reached out to individual researchers through multiple communication channels. As a result, more than 1,500 taxonomists responded by filling in a self-declaration survey to provide information about their personal and academic profile, qualification and activities. 

Then, the collected information was assessed against numerical criteria to classify the scientists into categories similar to those used by the IUCN Red List of Threatened SpeciesTM. In the European List of Insect Taxonomists, these range from Eroded Capacity (equivalent to Extinct) to Adequate Capacity (equivalent to Least Concern). The assessment was applied to the 29 insect orders (i.e. beetles, moths and butterflies etc.) to figure out which insect groups the society, conservation practitioners and decision-makers need not be concerned at this point.

Overview of the taxonomic capacity in European countries based upon the Red List Index (colour gradient goes from red (Eroded Capacity) to green (Adequate Capacity).
Image by the European Red List of Taxonomists consortium.

On a country level, the results showed that Czechia, Germany and Russia demonstrate the most adequate coverage of insect groups. Meanwhile, Albania, Azerbaijan, Belarus, Luxembourg, Latvia, Ireland and Malta turned out to be the ones with insufficient number of taxonomists.

In most cases, the availability of experts seems to correlate to GDP, as wealthiest countries tend to invest more in their scientific institutions.

What is particularly worrying is that the lack of taxonomic expertise is more evident in the countries with the greatest species diversity. This trend may cause even more significant problems in the knowledge and conservation of these species, further aggravating the situation. Thus, the report provides further evidence about a global pattern where the countries richest in biodiversity are also the ones poorest in financial and human resources. 

The research team also reminds that it is European natural history museums that host the largest scientific collections – including insects – brought from all over the globe. As such, Europe is responsible to the world for maintaining taxonomic knowledge and building adequate expert capacity.

Other concerning trends revealed in the new report are that the community of taxonomists is also ageing and – especially in the older groups – male-dominated (82%). 

One reason to have fewer young taxonomists could be due to limited opportunities for professional training (…), and the fact that not all professional taxonomists provide it, as a significant number of taxonomists are employed by museums and their opportunities for interaction with university students is probably not optimal. Gender bias is very likely caused by multiple factors, including fewer opportunities for women to be exposed to taxonomic research and gain an interest, unequal offer of career opportunities and hiring decisions. A fair-playing field for all genders will be crucial to address these shortcomings and close the gap.

comments Ana CasinoCETAF’s Executive Director.

***

Entomologist examining a small insect under a microscope.
Photo by anton_shoshin/stockadobe.com.
The European Red List of Taxonomists concludes with practical recommendations concerning strategic, science and societal priorities, addressed to specific decision-makers.

The authors give practical examples and potential solutions in support of their call to action.

For instance, in order to develop targeted and sustainable funding mechanisms to support taxonomy, they propose the launch of regular targeted Horizon Europe calls to study important insect groups for which taxonomic capacity has been identified to be at a particularly high risk of erosion.

To address specific gaps in expertise – such as the ones reported in the publication from Romania – a country known for its rich insect diversity, yet poor in taxonomic expertise – the consortium proposes the establishment of a natural history museum or entomological research institute that is well-fitted to serve as a taxonomic facility.

Amongst the scientific recommendations, the authors propose measures to ensure better recognition of taxonomic work at a multidisciplinary level. The scientific community, including disciplines that use taxonomic research, such as molecular biology, medicine and agriculture – need to embrace universal standards and rigorous conduct for the correct citation of scientific publications by insect taxonomists.

Societal engagement is another important call. “It is pivotal to widely raise awareness of the value and impact of taxonomy and the work of taxonomists. We must motivate young generations to join the scientific community” points Prof. Lyubomir Penev, Managing Director of Pensoft.

***

Understanding taxonomy is a key to understanding the extinction risk of speciesIf we strategically target the gaps in expert capacity that this European Red List identifies, we can better protect biodiversity and support the well-being and livelihoods of our societies. With the climate crisis at hand, there is no time left to waste,

added David Allen from the IUCN Red List team.

As a dedicated supporter of the IUCN Red List, I am inspired by this call to strengthen the capacity, guided by evidence and proven scientific methods. However, Europe has much more scientific capacity than most biodiversity-rich regions of the world. So, what this report particularly highlights is the need for massively increasing investment in scientific discovery, and building taxonomic expertise, around the world,”  

said Jon Paul Rodríguez, Chair of the IUCN Species Survival Commission.

***

Follow and join the conversation on Twitter using the #RedListTaxonomists hashtag. 

‘Nature’s Envelope’ – a simple device that reveals the scope and scale of all biological processes

All processes fit into a broad S-shaped envelope extending from the briefest to the most enduring biological events. For the first time, we have the first simple model that depicts the scope and scale of biology.

Arctic tern by Mark Stock, Schleswig-Holstein Wadden Sea National Park. License: CC BY-SA.

As biology is progressing into a digital age, it is creating new opportunities for discovery. 

Increasingly, information from investigations into aspects of biology from ecology to molecular biology is available in a digital form. Older ‘legacy’ information is being digitized. Together, the digital information is accumulated in databases from which it can be harvested and examined with an increasing array of algorithmic and visualization tools.

From this trend has emerged a vision that, one day, we should be able to analyze any and all aspects of biology in this digital world. 

However, before this can happen, there will need to be an infrastructure that gathers information from ALL sources, reshapes it as standardized data using universal metadata and ontologies, and made freely available for analysis. 

That information also must make its way to trustworthy repositories to guarantee the permanent access to the data in a polished and fully suited for re-use state.

The first layer in the infrastructure is the one that gathers all old and new information, whether it be about the migrations of ocean mammals, the sequence of bases in ribosomal RNA, or the known locations of particular species of ciliated protozoa.

How many of these subdomains will be there?

To answer this, we need to have a sense of the scope and scale of biology.

With the Nature’s Envelope we have, for the first time, a simple model that depicts the scope and scale of biology. Presented as a rhetorical device by its author Dr David J. Patterson (University of Sydney, Australia), the Nature’s Envelope is described in a Forum Paper, published in the open-science journal Research Ideas and Outcomes (RIO).

This is achieved by compiling information about the processes conducted by all living organisms. The processes occur at all levels of organization, from sub-molecular transactions, such as those that underpin nervous impulses, to those within and among plants, animals, fungi, protists and prokaryotes. Further, they are also the actions and reactions of individuals and communities; but also the sum of the interactions that make up an ecosystem; and finally, the consequences of the biosphere as a whole system. 

Nature’s Envelope, in green, includes all processes carried out by, involving, or the result of the activities of any and all organisms. The axes depict the duration of events and the sizes of participants using a log10 scale. Image by David J. Patterson. License: CC BY.

In the Nature’s Envelope, information on sizes of participants and durations of processes from all levels of organization are plotted on a grid. The grid uses a logarithmic (base 10) scale, which has about 21 orders of magnitude of size and 35 orders of magnitude of time. Information on processes ranging from the subatomic, through molecular, cellular, tissue, organismic, species, communities to ecosystems is assigned to the appropriate decadal blocks. 

Examples include movements from the stepping motion of molecules like kinesin that move forward 8 nanometres in about 10 milliseconds; or the migrations of Arctic terns which follow routes of 30,000 km or more from Europe to Antarctica over 3 to 4 months.

The extremes of life processes are determined by the smallest and largest entities to participate, and the briefest and most enduring processes.

The briefest event to be included is the transfer of energy from a photon to a photosynthetic pigment as the photon passes through a chlorophyll molecule several nanometres in width at a speed of 300,000 km per second. That transaction is conducted in about 10-17 seconds. As it involves the smallest subatomic particles, it defines the lower left corner of the grid. 

The most enduring is the process of evolution that has been progressing for almost 4 billion years. The influence of the latter has created the biosphere (the largest living object) and affects the gas content of the atmosphere. This process established the upper right extreme of the grid.

All biological processes fit into a broad S-shaped envelope that includes about half of the decadal blocks in the grid. The envelope drawn round the initial examples is Nature’s Envelope.

Nature’s envelope will be a useful addition to many discussions, whether they deal with the infrastructure that will manage the digital age of biology, or provide the context for education on the diversity and range of processes that living systems engage in.

The version of Nature’s Envelope published in the RIO journal is seen as a first version, to be refined and enhanced through community participation,”

comments Patterson.

***

Original source:

Patterson DJ (2022) The scope and scale of the life sciences (‘Nature’s envelope’). Research Ideas and Outcomes 8: e96132. https://doi.org/10.3897/rio.8.e96132

***

Follow Research Ideas and Outcomes (RIO Journal) on Twitter, Facebook and Linkedin.

Digitising beans to feed the world

In 2018, NHM London’s digitisation team started a project to digitise non-type herbarium material from the legume family. A recent data paper in the Biodiversity Data Journal reports on the outcomes.

You can find the original blog post by the Natural History Museum of London, reposted here with minor edits.

Legumes are a group of plants that include soybeans, peas, chickpeas, peanuts and lentils. They are a significant source of protein, fibre, carbohydrates, and minerals in our diet and some, like the cowpea, are resistant to droughts.

In 2018, the Natural History Museum of London’s (NHM London) digitisation team started a project in collaboration with project leader Royal Botanic Gardens Kew and the Royal Botanic Garden Edinburgh.

The project’s outcomes were published in a data paper in the Biodiversity Data Journal. Within the project, the digitisation team aimed to collectively digitise non-type herbarium material from the legume family. This includes rosewood trees (Dalbergia), padauk trees (Pterocarpus) and the Phaseolinae subtribe that contains many of the beans cultivated for human and animal food.

This project was made possible through the Department for Environment Food & Rural Affairs (DEFRA)-allocated Official Development Assistance (ODA) funding, distributed by the UK government in its “global efforts to defeat poverty, tackle instability and create prosperity in developing countries”.

AfricanGuinea, Ethiopia, Sudan, Kenya, Uganda, Tanzania, Mozambique, Malawi and Madagascar
AsianBangladesh, Myanmar, Nepal, New Guinea and India
Southern and Central AmericanGuatemala, Honduras, El Salvador, Nicaragua, Bolivia, Argentina and Brazil
ODA-listed Countries

The legume groups: Dalbergia, Pterocarpus and Phaseolinae,were chosen for digitisation to support the development of dry beans as a sustainable and resilient crop, and to aid conservation and sustainable use of rosewood and padauk trees. Some of these beans, especially cow pea and pigeon pea, are sustainable and resilient crops, as they can be grown in poor-quality soils and are drought stress resistant. This makes them particularly suitable for agricultural production where the growing of other crops would be difficult.

Digitally discoverable herbarium specimens can provide important information about the distribution of individual species, as well as highlighting which species occur naturally together.

While there have been collaborative efforts between herbaria in the past, these have tended to prioritise digitisation of type specimens: the example specimens for which a species is named.

Types are important to identification, but being individual specimens, they don’t offer insights into species distribution over time. By focusing on the non-types across the world and over the last 200 years, we have released a brand-new resource to the global scientific community.

Searching for beans

This collection was digitised by creating an inventory record for each specimen, attaching images of each herbarium sheet, and then transcribing more data and georeferencing the specimens, providing an accurate locality in space and time for their collection. 

We originally had four months and three members of staff to digitise over 11,000 specimens. The Covid-19 lockdown was ironically rather lucky for this project as it enabled us to have more time to transcribe and georeference all of the records. 

say the researchers behind the digitisation project.
Map showing breakdown of records by country.

“We were able to assign country-level data to 10,857 out of the total number of 11,222 records. We were also able to transcribe the collectors’ names from the majority of our specimen labels (10,879 out of 11,222). Only 770 out of the 2,226 individuals identified during this project collected their specimens in ODA listed countries. The highest contributors were: Richard Beddome (130 specimens), Charles Clarke (110), Hans Schlieben (98) and Nathaniel Wallich (79). The breakdown of records by ODA country can be seen in the chart below. “

Map showing breakdown of records by country and pie chart showing distribution by ODA listed countries.

From our data, we can see the peak decade of collection was the 1930s, with almost half (4,583 specimens or 49,43%) collected between 1900 and 1950 (Fig. 10).

This peak can be attributed to three of our most prolific collectors: Arthur Kerr, John Gossweiler and Georges Le Testu, all of whom were most active in the 1930s. The oldest specimen (BM013713473) was collected by Mark Catesby (1683-1749) in the Bahamas in 1726.

they explain.

An interesting, but perhaps unsurprising, finding is that our collection is strongly male-dominated.

There are only two women (Caroline Whitefoord and Ynes Mexia) in the list of our top 50 plant collectors and they are not close to the most prolific collectors.

We identified more women in the rest of our records, but their contribution is on average less than 25 specimens per person in the dataset consisting of more than 10,000 specimens. In contrast, the top five male collectors contributed 10% of our collection. 

they continued

Releasing Rosewoods

Both the Pterocarpus and Dalbergia genera include species that are used as expensive good quality timber that is prone to illegal logging. Many species such as Pterocarpus tinctorius are also listed on the International Union for Conservation of Nature (IUCN) Red List of Threatened Species. By releasing this new resource of information on all these plants from three of the biggest herbaria in the world, we can share this datа with the people who are taking care of biodiversity in these countries. The data can be used to identify hotspots, where the tree is naturally growing and protect these areas. These data would also allow much closer attention to be paid to areas that could be targets for illegal logging activity.

Pterocarpus tinctorius is a species of padauk tree that is listed as endangered on the IUCN Red List.
Cowpea (Vigna unguiculata) is a food and animal feed crop grown in the semi-arid tropics.

The ODA-listed countries are economically impoverished and disproportionately prone to be disadvantaged with the changing climate whether from flood or drought or increase in temperature.

Using data to identify good, nutritious plant species that can be grown in such conditions can therefore benefit local communities, potentially reducing dependence on imports, aid and on less resilient crops. 

the team adds in conclusion.

***

This dataset is now openly available on the Museum’s Data Portal and a data paper about this work has been released in the Biodiversity Data Journal.

***

Stay in touch with the Digitisation team by following us on Instagram and Twitter

Don’t forget to also follow the Biodiversity Data Journal on Twitter and Facebook.

One Biodiversity Knowledge Hub to link them all: BiCIKL 2nd General Assembly

The FAIR Data Place – the key and final product of the partnership – is meant to provide scientists with all types of biodiversity data “at their fingertips”

The Horizon 2020 – funded project BiCIKL has reached its halfway stage and the partners gathered in Plovdiv (Bulgaria) from the 22nd to the 25th of October for the Second General Assembly, organised by Pensoft

The BiCIKL project will launch a new European community of key research infrastructures, researchers, citizen scientists and other stakeholders in the biodiversity and life sciences based on open science practices through access to data, tools and services.

BiCIKL’s goal is to create a centralised place to connect all key biodiversity data by interlinking 15 research infrastructures and their databases. The 3-year European Commission-supported initiative kicked off in 2021 and involves 14 key natural history institutions from 10 European countries.

BiCIKL is keeping pace as expected with 16 out of the 48 final deliverables already submitted, another 9 currently in progress/under review and due in a few days. Meanwhile, 21 out of the 48 milestones have been successfully achieved.

Prof. Lyubomir Penev (BiCIKL’s project coordinator Prof. Lyubomir Penev and CEO and founder of Pensoft) opens the 2nd General Assembly of BiCIKL in Plovdiv, Bulgaria.

The hybrid format of the meeting enabled a wider range of participants, which resulted in robust discussions on the next steps of the project, such as the implementation of additional technical features of the FAIR Data Place (FAIR being an abbreviation for Findable, Accessible, Interoperable and Reusable).

This FAIR Data Place online platform – the key and final product of the partnership and the BiCIKL initiative – is meant to provide scientists with all types of biodiversity data “at their fingertips”.

This data includes biodiversity information, such as detailed images, DNA, physiology and past studies concerning a specific species and its ‘relatives’, to name a few. Currently, the issue is that all those types of biodiversity data have so far been scattered across various databases, which in turn have been missing meaningful and efficient interconnectedness.

Additionally, the FAIR Data Place, developed within the BiCIKL project, is to give researchers access to plenty of training modules to guide them through the different services.

Halfway through the duration of BiCIKL, the project is at a turning point, where crucial discussions between the partners are playing a central role in the refinement of the FAIR Data Place design. Most importantly, they are tasked with ensuring that their technologies work efficiently with each other, in order to seamlessly exchange, update and share the biodiversity data every one of them is collecting and taking care of.

By Year 3 of the BiCIKL project, the partners agree, when those infrastructures and databases become efficiently interconnected to each other, scientists studying the Earth’s biodiversity across the world will be in a much better position to build on existing research and improve the way and the pace at which nature is being explored and understood. At the end of the day, knowledge is the stepping stone for the preservation of biodiversity and humankind itself.


“Needless to say, it’s an honour and a pleasure to be the coordinator of such an amazing team spanning as many as 14 partnering natural history and biodiversity research institutions from across Europe, but also involving many global long-year collaborators and their infrastructures, such as Wikidata, GBIF, TDWG, Catalogue of Life to name a few,”

said BiCIKL’s project coordinator Prof. Lyubomir Penev, CEO and founder of Pensoft.

“I see our meeting in Plovdiv as a practical demonstration of our eagerness and commitment to tackle the long-standing and technically complex challenge of breaking down the silos in the biodiversity data domain. It is time to start building freeways between all biodiversity data, across (digital) space, time and data types. After the last three days that we spent together in inspirational and productive discussions, I am as confident as ever that we are close to providing scientists with much more straightforward routes to not only generate more biodiversity data, but also build on the already existing knowledge to form new hypotheses and information ready to use by decision- and policy-makers. One cannot stress enough how important the role of biodiversity data is in preserving life on Earth. These data are indeed the groundwork for all that we know about the natural world”  

Prof. Lyubomir Penev added.
Christos Arvanitidis (CEO of LifeWatch ERIC) at the 2nd General Assembly of the BiCIKL project.

Christos Arvanitidis, CEO of LifeWatch ERIC, added:

“The point is: do we want an integrated structure or do we prefer federated structures? What are the pros and cons of the two options? It’s essential to keep the community united and allied because we can’t afford any information loss and the stakeholders should feel at home with the Project and the Biodiversity Knowledge Hub.”


Joe Miller, Executive Secretary and Director at GBIF, commented:

“We are a brand new community, and we are in the middle of the growth process. We would like to already have answers, but it’s good to have this kind of robust discussion to build on a good basis. We must find the best solution to have linkages between infrastructures and be able to maintain them in the future because the Biodiversity Knowledge Hub is the location to gather the community around best practices, data and guidelines on how to use the BiCIKL services… In order to engage even more partners to fill the eventual gaps in our knowledge.”


Joana Pauperio (biodiversity curator at EMBL-EBI) at the 2nd General Assembly of the BiCIKL project.

“BiCIKL is leading data infrastructure communities through some exciting and important developments”  

said Dr Guy Cochrane, Team Leader for Data Coordination and Archiving and Head of the European Nucleotide Archive at EMBL’s European Bioinformatics Institute (EMBL-EBI).

“In an era of biodiversity change and loss, leveraging scientific data fully will allow the world to catalogue what we have now, to track and understand how things are changing and to build the tools that we will use to conserve or remediate. The challenge is that the data come from many streams – molecular biology, taxonomy, natural history collections, biodiversity observation – that need to be connected and intersected to allow scientists and others to ask real questions about the data. In its first year, BiCIKL has made some key advances to rise to this challenge,”

he added.

Deborah Paul, Chair of the Biodiversity Information Standards – TDWG said:

“As a partner, we, at the Biodiversity Information Standards – TDWG, are very enthusiastic that our standards are implemented in BiCIKL and serve to link biodiversity data. We know that joining forces and working together is crucial to building efficient infrastructures and sharing knowledge.”


The project will go on with the first Round Table of experts in December and the publications of the projects who participated in the Open Call and will be founded at the beginning of the next year.

***

Learn more about BiCIKL on the project’s website at: bicikl-project.eu

Follow BiCIKL Project on Twitter and Facebook. Join the conversation on Twitter at #BiCIKL_H2020.

***

All BiCIKL project partners:

‘Who is in your database and why does it matter?’

The uncertainty about a person’s identity hampers research, hinders the discovery of expertise, and obstructs the ability to give attribution or credit for work performed. 

Collection discovery through disambiguation

Guest blog post by Sabine von Mering, Heather Rogers, Siobhan Leachman, David P. ShorthouseDeborah Paul & Quentin Groom

Worldwide, natural history institutions house billions of physical objects in their collections, they create and maintain data about these items, and they share their data with aggregators such as the Global Biodiversity Information Facility (GBIF), the Integrated Digitized Biocollections (iDigBio), the Atlas of Living Australia (ALA), Genbank and the European Nucleotide Archive (ENA). 

Even though these data often include the names of the people who collected or identified each object, such statements may be ambiguous, as the names frequently lack any globally unique, machine-readable concept of their shared identity.

Despite the data being available online, barriers exist to effectively use the information about who collects or provides the expertise to identify the collection objects. People have similar names, change their name over the course of their lifetime (e.g. through marriage), or there may be variability introduced through the label transcription process itself (e.g. local look-up lists). 

As a result, researchers and collections staff often spend a lot of time deducing who is the person or people behind unknown collector strings while collating or tidying natural history data. The uncertainty about a person’s identity hampers research, hinders the discovery of expertise, and obstructs the ability to give attribution or credit for work performed. 

Disambiguation activities: the act of churning strings into verifiable things using all available evidence – need not be done in isolation. In addition to presenting a workflow on how to disambiguate people in collections, we also make the case that working in collaboration with colleagues and the general public presents new opportunities and introduces new efficiencies. There is tacit knowledge everywhere.

More often than not, data about people involved in biodiversity research are scattered across different digital platforms. However, with linking information sources to each other by using person identifiers, we can better trace the connections in these networks, so that we can weave a more interoperable narrative about every actor.

That said, inconsistent naming conventions or lack of adequate accreditation often frustrate the realization of this vision. This sliver of natural history could be churned to gold with modest improvements in long-term funding for human resources, adjustments to digital infrastructure, space for the physical objects themselves alongside their associated documents, and sufficient training on how to disambiguate people’s names.

“He aha te mea nui o te ao. He tāngata, he tāngata, he tāngata.

“What is the most important thing in the world? It is people, it is people, it is people.”

(Māori proverb)

The process of properly disambiguating those who have contributed to natural history collections takes time. 

The disambiguation process involves the extra challenge of trying to deduce “who is who” for legacy data, compared to undertaking this activity for people alive today. Retrospective disambiguation can require considerable detective work, especially for scarcely known people or if the community has a different naming convention. Provided the results of this effort are well-communicated and openly shared, mercifully, it need only be done once.

At the core of our research is the question of how to solve the issue of assigning proper credit

In our recent Methods paper, we discuss several methods for this, as well as available routes for making records available online that include not only the names of people expressed as text, but additionally twinned with their unique, resolvable identifiers. 

Disambiguation is a cycle. Enrichment of the data feeds off itself leading to further disambiguation. As more names are disambiguated and more biographical data are accumulated, it becomes easier to disambiguate more names. 

First and foremost, we should maintain our own public biographical data by making full use of ORCID. In addition to preserving our own scientific legacy and that of the institutions that employ us, we have a responsibility to avoid generating unnecessary disambiguation work for others. 

For legacy data, where the people connected to the collections are deceased, Wikidata can be used to openly document rich bibliographic and demographic data, each statement with one or more verifiable references. Wikidata can also act as a bridge to link other sources of authority such as VIAF or ORCID identifiers. It has many tools and services to bulk import, export, and to query information, making it well-suited as a universal democratiser of information about people often walled-off in collection management systems (CMS). 

A network of the top twenty most used identifiers for biologists on Wikidata.

Once unique identifiers for people are integrated in collection management systems, these may be shared with the global collections and research community using the new Darwin Core terms, recordedByID or identifiedByID along with the well-known, yet text-based terms, recordedBy or identifiedBy. 

Approximately 120 datasets published through GBIF now make use of these identifier-based terms, which are additionally resolved in Bionomia every few weeks alongside co-curated attributions newly made there. This roundtrip of data – emerging as ambiguous strings of text from the source, affixed with resolvable identifiers elsewhere, absorbed into the source as new digital annotations, and then re-emerging with these fresh, identifier-based enhancements – is an exciting approach to co-manage collections data.

Round tripping. In Bionomia, people identifiers from Wikidata and ORCID are used to enrich data published via GBIF, thus linking natural history specimens to the world’s collectors.

Disambiguation work is particularly important in recognising contributors who have been historically marginalized. For example, gender bias in specimen data can be seen in the case of Wilmatte Porter Cockerell, a prolific collector of botanical, entomological and fossil specimens. Cockerell’s collections are often attributed to her husband as he was also a prolific collector and the two frequently collected together. 

On some labels, her identity is further obscured as she is simply recorded as “& wife” (see example on GBIF). Since Wilmatte Cockerell was her husband’s second wife, it can take some effort to confirm if a specimen can be attributed to her and not her husband’s first wife, who was also involved in collecting specimens. By ensuring that Cockerell is disambiguated and her contributions are appropriately attributed, the impact of her work becomes more visible enabling her work to be properly and fairly credited.

Thus, disambiguation work helps to not only give credit where credit is due, thereby making data about people and their biodiversity collections more findable, but it also creates an inclusive and representative narrative of the landscape of people involved with scientific knowledge creation, identification, and preservation. 

A future – once thought to be a dream – where the complete scientific output of a person is connected as Linked Open Data (LOD) is now

Both the tools and infrastructure are at our disposal and the demand is palpable. All institutions can contribute to this movement by sharing data that include unique identifiers for the people in their collections. We recommend that institutions develop a strategy, perhaps starting with employees and curatorial staff, people of local significance, or those who have been marginalized, and to additionally capitalize on existing disambiguation activities elsewhere. This will have local utility and will make a significant, long-term impact. 

The more we participate in these activities, the greater chance we will uncover positive feedback loops, which will act to lighten the workload for all involved, including our future selves!

The disambiguation of people in collections is an ongoing process, but it becomes easier with practice. We also encourage collections staff to consider modifying their existing workflows and policies to include identifiers for people at the outset, when new data are generated or when new specimens are acquired. 

There is more work required at the global level to define, update, and ratify standards and best practices to help accelerate data exchange or roundtrips of this information; there is room for all contributions. Thankfully, there is a diverse, welcoming, energetic, and international community involved in these activities. 

We see a bright future for you, our collections, and our research products – well within reach – when the identities of people play a pivotal role in the construction of a knowledge graph of life.

You would like to participate and need support getting disambiguation of your collection started? Please contact our TDWG People in Biodiversity Data Task Group.

A good start is also to check Bionomia to find out what metrics exist now for your institution or collection and affiliated people.

The next steps for collections: 7 objectives that can help to disambiguate your institutions’ collection:

1. Promote the use of person identifiers in local, national or international outreach, publishing and research activities

2. Increase the number of collection management systems that use person identifiers

3. Increase the number of living collectors registered and using an ORCID identifier when contributing to collections

4. Undertake disambiguation in the national languages of many countries

5. Increase the number of identified people on Wikidata linked to collections

6. Increase the number of people in collections with expertise in person disambiguation

7. Collaborate towards an exchange standard for attribution data

A real example of how a name string is disambiguated and the steps taken in documenting it. Wikidata item of Jean-André Soulié

***

Methods publication:

Groom Q, Bräuchler C, Cubey RWN, Dillen M, Huybrechts P, Kearney N, Klazenga N, Leachman S, Paul DL, Rogers H, Santos J, Shorthouse DP, Vaughan A, von Mering S, Haston EM (2022) The disambiguation of people names in biological collections. Biodiversity Data Journal 10: e86089. https://doi.org/10.3897/BDJ.10.e86089

***

Follow Biodiversity Data Journal on Twitter and Facebook.

Volunteer “community scientists” do a pretty darn good job generating usable data

When museum-goers did a community science activity in an exhibit at the Field Museum (USA), the data they produced were largely accurate.

Left: Cuong Pham, Jimmy Crigler, and Joshua Torres working on a community science platform in an exhibit at the Field Museum (photo by Melanie Pivarski, Roosevelt University).
Right: The microscopic leaves of a liverwort, a primitive plant that helps scientists track climate change (photo by Lauren Johnson, Field Museum).
Original publication by the Field Museum

Ask any scientist — for every “Eureka!” moment, there’s a lot of less-than-glamorous work behind the scenes. Making discoveries about everything from a new species of dinosaur to insights about climate change entails some slogging through seemingly endless data and measurements that can be mind-numbing in large doses.

Community science shares the burden with volunteers who help out, for even just a few minutes, on collecting data and putting it into a format that scientists can use. But the question remains how useful these data actually are for scientists. 

A new study, authored by a combination of high school students, undergrads and grad students, and professional scientists showed that when museum-goers did a community science activity in an exhibit, the data they produced were largely accurate, supporting the argument that community science is a viable way to tackle big research projects.

“It was surprising how all age groups from young children, families, youth, and adults were able to generate high-quality taxonomic data sets, making observations and preparing measurements, and at the same time empowering community scientists through authentic contributions to science,”

says Matt von Konrat (Field Museum, USA), an author of the paper in the journal Research Ideas and Outcomes (RIO Journal) and the head of plant collections at Chicago’s Field Museum.

“This study demonstrates the wonderful scientific outcomes that occur when an entire community comes together,”

says Melanie Pivarski, an associate professor of mathematics at Roosevelt University (USA) and the study’s lead author.

“We were able to combine a small piece of the Field Museum’s vast collections, their scientific knowledge and exhibit creation expertise, the observational skills of biology interns at Northeastern Illinois University (USA), led by our collaborator Tom Campbell, and our Roosevelt University student’s data science expertise. The creation of this set of high-quality data was a true community effort!” 

The study focuses on an activity in an exhibition at the Field Museum, in which visitors could partake in a community science project. In the community science activity, museumgoers used a large digital touchscreen to measure the microscopic leaves photographs of plants called liverworts. 

These tiny plants, the size of an eyelash, are sensitive to climate change, and they can act like a canary in a coal mine to let scientists know about how climate change is affecting a region. It’s helpful for scientists to know what kinds of liverworts are present in an area, but since the plants are so tiny, it’s hard to tell them apart. The sizes of their leaves (or rather, lobes — these are some of the most ancient land plants on Earth, and they evolved before true leaves had formed) can hint at their species. But it would take ages for any one scientist to measure all the leaves of the specimens in the Field’s collection. Enter the community scientists.

“Drawing a fine line to measure the lobe of a liverwort for a few hours can be mentally strenuous, so it’s great to have community scientists take a few minutes out of their day using fresh eyes to help measure a plant leaf. A few community scientists who’ve helped with classifying acknowledged how exciting it is knowing they are playing a helping hand in scientific discovery,”  

says Heaven Wade, a research assistant at the Field Museum who began working on the MicroPlants project as an undergraduate intern.

Community scientists using the digital platform measured thousands of microscopic liverwort leaves over the course of two years.

“At the beginning, we needed to find a way to sort the high quality measurements out from the rest. We didn’t know if there would be kids drawing pictures on the touchscreen instead of measuring leaves or if they’d be able to follow the tutorial as well as the adults did. We also needed to be able to automate a method to determine the accuracy of these higher quality measurements,”

says Pivarski.

To answer these questions, Pivarski worked with her students at Roosevelt University to analyze the data. They compared measurements taken by the community scientists with measurements done by experts on a couple “test” lobes; based on that proof of concept, they went on to analyze the thousands of other leaf measurements. The results were surprising.

“We were amazed at how wonderfully children did at this task; it was counter to our initial expectations. The majority of measurements were high quality. This allowed my students to create an automated process that produced an accurate set of MicroPlant measurements from the larger dataset,”

says Pivarski.

The researchers say that the study supports the argument that community science is valuable not just as a teaching tool to get people interested in science, but as a valid means of data collection.

“Biological collections are uniquely poised to inform the stewardship of life on Earth in a time of cataclysmic biodiversity loss, yet efforts to fully leverage collections are impeded by a lack of trained taxonomists. Crowd-sourced data collection projects like these have the potential to greatly accelerate biodiversity discovery and documentation from digital images of scientific specimens,”

says von Konrat.
Research article:

Pivarski M, von Konrat M, Campbell T, Qazi-Lampert AT, Trouille L, Wade H, Davis A, Aburahmeh S, Aguilar J, Alb C, Alferes K, Barker E, Bitikofer K, Boulware KJ, Bruton C, Cao S, Corona Jr. A, Christian C, Demiri K, Evans D, Evans NM, Flavin C, Gillis J, Gogol V, Heublein E, Huang E, Hutchinson J, Jackson C, Jackson OR, Johnson L, Kirihara M, Kivarkis H, Kowalczyk A, Labontu A, Levi B, Lyu I, Martin-Eberhardt S, Mata G, Martinec JL, McDonald B, Mira M, Nguyen M, Nguyen P, Nolimal S, Reese V, Ritchie W, Rodriguez J, Rodriguez Y, Shuler J, Silvestre J, Simpson G, Somarriba G, Ssozi R, Suwa T, Syring C, Thirthamattur N, Thompson K, Vaughn C, Viramontes MR, Wong CS, Wszolek L (2022) People-Powered Research and Experiential Learning: Unravelling Hidden Biodiversity. Research Ideas and Outcomes 8: e83853. https://doi.org/10.3897/rio.8.e83853

Follow RIO Journal on Twitter and Facebook.

Scientists conceptualize a species ‘stock market’ to put a price tag on actions posing risks to biodiversity

“…the most realistic and tangible way out of the looming biodiversity crisis is to put a price tag on species and thereby a cost to actions that compromise them.”

So far, science has described more than 2 million species, and millions more await discovery. While species have value in themselves, many also deliver important ecosystem services to humanity, such as insects that pollinate our crops. 

Meanwhile, as we lack a standardized system to quantify the value of different species, it is too easy to jump to the conclusion that they are practically worthless. As a result, humanity has been quick to justify actions that diminish populations and even imperil biodiversity at large.

In a study, published in the scholarly open-science journal Research Ideas and Outcomes, a team of Estonian and Swedish scientists propose to formalize the value of all species through a conceptual species ‘stock market’ (SSM). Much like the regular stock market, the SSM is to act as a unified basis for instantaneous valuation of all items in its holdings.

However, other aspects of the SSM would be starkly different from the regular stock market. Ownership, transactions, and trading will take new forms. Indeed, species have no owners, and ‘trade’ would not be about transfer of ownership rights among shareholders. Instead, the concept of ‘selling’ would comprise processes that erase species from some specific area – such as war, deforestation, or pollution.

“The SSM would be able to put a price tag on such transactions, and the price could be thought of as an invoice that the seller needs to settle in some way that benefits global biodiversity,”

explains the study’s lead author Prof. Urmas Kõljalg (University of Tartu, Estonia).

Conversely, taking some action that benefits biodiversity – as estimated through individuals of species – would be akin to buying on the species stock market. Buying, too, has a price tag on it, but this price should probably be thought of in goodwill terms. Here, ‘money’ represents an investment towards increased biodiversity. 

“By rooting such actions in a unified valuation system it is hoped that goodwill actions will become increasingly difficult to dodge and dismiss,”

adds Kõljalg.

Interestingly, the SSM revolves around the notion of digital species. These are representations of described and undescribed species concluded to exist based on DNA sequences and elaborated by including all we know about their habitat, ecology, distribution, interactions with other species, and functional traits. 

For the SSM to function as described, those DNA sequences and metadata need to be sourced from global scientific and societal resources, including natural history collections, sequence databases, and life science data portals. Digital species might be managed further by incorporating data records of non-sequenced individuals, notably observations, older material in collections, and data from publications.

The study proposes that the SSM is orchestrated by the international associations of taxonomists and economists. 

“Non-trivial complications are foreseen when implementing the SSM in practice, but we argue that the most realistic and tangible way out of the looming biodiversity crisis is to put a price tag on species and thereby a cost to actions that compromise them,”

says Kõljalg.

“No human being will make direct monetary profit out of the SSM, and yet it’s all Earth’s inhabitants – including humans – that could benefit from its pointers.”

Original source

Kõljalg U, Nilsson RH, Jansson AT, Zirk A, Abarenkov K (2022) A price tag on species. Research Ideas and Outcomes 8: e86741. https://doi.org/10.3897/rio.8.e86741

***

Follow RIO Journal on Twitter and Facebook.

Call for Expression of Interest for biodiversity data-related scientific projects from BiCIKL

The purpose of this call is to solicit, select and implement four to six biodiversity data-related scientific projects that will make use of the added value services developed by the leading Research Infrastructures that make the BiCIKL project.

The BiCIKL project invites submissions of Expression of Interest (EoI) to the First BiCIKL Open Call for projects. The purpose of this call is to solicit, select and implement four to six biodiversity data-related scientific projects that will make use of the added value services developed by the leading Research Infrastructures that make the BiCIKL project.

By opening this call, BiCIKL aims to better understand how it could support scientific questions that arise from across the biodiversity world in the future, while addressing specific scientific or technical biodiversity data challenges presented by the applicants.

We need and want to assess real-world problems and make the best possible use of our data and technical capabilities. This will greatly assist in defining the long-term development goals of the participating Research Infrastructures and improve the way they can technically and operationally work together to deliver greater scientific value.

explain the project partners.

The BiCIKL project – a Horizon 2020-funded project involving 14 European institutions, representing major global players in biodiversity research and natural history, and coordinated by Pensoft – establishes a European starting community of key research infrastructures, researchers, citizen scientists and other biodiversity and life sciences stakeholders based on open science practices through access to data, tools and services.

Find more about the Call and submit your Expression of Interest

***

Follow BiCIKL on Twitter and Facebook.

Join the conversation on Twitter via #BiCIKL_H2020.