One Biodiversity Knowledge Hub to link them all: BiCIKL 2nd General Assembly

The FAIR Data Place – the key and final product of the partnership – is meant to provide scientists with all types of biodiversity data “at their fingertips”

The Horizon 2020 – funded project BiCIKL has reached its halfway stage and the partners gathered in Plovdiv (Bulgaria) from the 22nd to the 25th of October for the Second General Assembly, organised by Pensoft

The BiCIKL project will launch a new European community of key research infrastructures, researchers, citizen scientists and other stakeholders in the biodiversity and life sciences based on open science practices through access to data, tools and services.

BiCIKL’s goal is to create a centralised place to connect all key biodiversity data by interlinking 15 research infrastructures and their databases. The 3-year European Commission-supported initiative kicked off in 2021 and involves 14 key natural history institutions from 10 European countries.

BiCIKL is keeping pace as expected with 16 out of the 48 final deliverables already submitted, another 9 currently in progress/under review and due in a few days. Meanwhile, 21 out of the 48 milestones have been successfully achieved.

Prof. Lyubomir Penev (BiCIKL’s project coordinator Prof. Lyubomir Penev and CEO and founder of Pensoft) opens the 2nd General Assembly of BiCIKL in Plovdiv, Bulgaria.

The hybrid format of the meeting enabled a wider range of participants, which resulted in robust discussions on the next steps of the project, such as the implementation of additional technical features of the FAIR Data Place (FAIR being an abbreviation for Findable, Accessible, Interoperable and Reusable).

This FAIR Data Place online platform – the key and final product of the partnership and the BiCIKL initiative – is meant to provide scientists with all types of biodiversity data “at their fingertips”.

This data includes biodiversity information, such as detailed images, DNA, physiology and past studies concerning a specific species and its ‘relatives’, to name a few. Currently, the issue is that all those types of biodiversity data have so far been scattered across various databases, which in turn have been missing meaningful and efficient interconnectedness.

Additionally, the FAIR Data Place, developed within the BiCIKL project, is to give researchers access to plenty of training modules to guide them through the different services.

Halfway through the duration of BiCIKL, the project is at a turning point, where crucial discussions between the partners are playing a central role in the refinement of the FAIR Data Place design. Most importantly, they are tasked with ensuring that their technologies work efficiently with each other, in order to seamlessly exchange, update and share the biodiversity data every one of them is collecting and taking care of.

By Year 3 of the BiCIKL project, the partners agree, when those infrastructures and databases become efficiently interconnected to each other, scientists studying the Earth’s biodiversity across the world will be in a much better position to build on existing research and improve the way and the pace at which nature is being explored and understood. At the end of the day, knowledge is the stepping stone for the preservation of biodiversity and humankind itself.


“Needless to say, it’s an honour and a pleasure to be the coordinator of such an amazing team spanning as many as 14 partnering natural history and biodiversity research institutions from across Europe, but also involving many global long-year collaborators and their infrastructures, such as Wikidata, GBIF, TDWG, Catalogue of Life to name a few,”

said BiCIKL’s project coordinator Prof. Lyubomir Penev, CEO and founder of Pensoft.

“I see our meeting in Plovdiv as a practical demonstration of our eagerness and commitment to tackle the long-standing and technically complex challenge of breaking down the silos in the biodiversity data domain. It is time to start building freeways between all biodiversity data, across (digital) space, time and data types. After the last three days that we spent together in inspirational and productive discussions, I am as confident as ever that we are close to providing scientists with much more straightforward routes to not only generate more biodiversity data, but also build on the already existing knowledge to form new hypotheses and information ready to use by decision- and policy-makers. One cannot stress enough how important the role of biodiversity data is in preserving life on Earth. These data are indeed the groundwork for all that we know about the natural world”  

Prof. Lyubomir Penev added.
Christos Arvanitidis (CEO of LifeWatch ERIC) at the 2nd General Assembly of the BiCIKL project.

Christos Arvanitidis, CEO of LifeWatch ERIC, added:

“The point is: do we want an integrated structure or do we prefer federated structures? What are the pros and cons of the two options? It’s essential to keep the community united and allied because we can’t afford any information loss and the stakeholders should feel at home with the Project and the Biodiversity Knowledge Hub.”


Joe Miller, Executive Secretary and Director at GBIF, commented:

“We are a brand new community, and we are in the middle of the growth process. We would like to already have answers, but it’s good to have this kind of robust discussion to build on a good basis. We must find the best solution to have linkages between infrastructures and be able to maintain them in the future because the Biodiversity Knowledge Hub is the location to gather the community around best practices, data and guidelines on how to use the BiCIKL services… In order to engage even more partners to fill the eventual gaps in our knowledge.”


Joana Pauperio (biodiversity curator at EMBL-EBI) at the 2nd General Assembly of the BiCIKL project.

“BiCIKL is leading data infrastructure communities through some exciting and important developments”  

said Dr Guy Cochrane, Team Leader for Data Coordination and Archiving and Head of the European Nucleotide Archive at EMBL’s European Bioinformatics Institute (EMBL-EBI).

“In an era of biodiversity change and loss, leveraging scientific data fully will allow the world to catalogue what we have now, to track and understand how things are changing and to build the tools that we will use to conserve or remediate. The challenge is that the data come from many streams – molecular biology, taxonomy, natural history collections, biodiversity observation – that need to be connected and intersected to allow scientists and others to ask real questions about the data. In its first year, BiCIKL has made some key advances to rise to this challenge,”

he added.

Deborah Paul, Chair of the Biodiversity Information Standards – TDWG said:

“As a partner, we, at the Biodiversity Information Standards – TDWG, are very enthusiastic that our standards are implemented in BiCIKL and serve to link biodiversity data. We know that joining forces and working together is crucial to building efficient infrastructures and sharing knowledge.”


The project will go on with the first Round Table of experts in December and the publications of the projects who participated in the Open Call and will be founded at the beginning of the next year.

***

Learn more about BiCIKL on the project’s website at: bicikl-project.eu

Follow BiCIKL Project on Twitter and Facebook. Join the conversation on Twitter at #BiCIKL_H2020.

***

All BiCIKL project partners:

#TDWG2022 recap: TDWG and Pensoft welcomed 400 biodiversity information experts from 41 countries in Sofia

For the 37th time, experts from across the world to share and discuss the latest developments surrounding biodiversity data and how they are being gathered, used, shared and integrated across time, space and disciplines.

Between 17th and 21st October, about 400 scientists and experts took part in a hybrid meeting dedicated to the development, use and maintenance of biodiversity data, technologies, and standards across the world.

This year, the conference was hosted by Pensoft in collaboration with the National Museum of Natural History (Bulgaria) and the Institute of Biodiversity and Ecosystem Research at the Bulgarian Academy of Science. It ran under the theme “Stronger Together: Standards for linking biodiversity data”.

For the 37th time, the global scientific and educational association Biodiversity Information Standards (TDWG) brought together experts from all over the globe to share and discuss the latest developments surrounding biodiversity data and how they are being gathered, used, shared and integrated across time, space and disciplines.

This was the first time the event happened in a hybrid format. It was attended by 160 people on-site, while another 235 people joined online. 

The TDWG 2022 conference saw plenty of networking and engaging discussions with as many as 160 on-site attendees and another 235 people, who joined the event remotely.

The conference abstracts, submitted by the event’s speakers ahead of the meeting, provide a sneak peek into their presentations and are all publicly available in the TDWG journal Biodiversity Information Science and Standards (BISS).

“It’s wonderful to be in the Balkans and Bulgaria for our Biodiversity Information and Standards (TDWG) 2022 conference! Everyone’s been so welcoming and thoughtfully engaged in conversations about biodiversity information and how we can all collaborate, contribute and benefit,”

said Deborah Paul, Chair of TDWG, a biodiversity informatics specialist and community liaison at the University of Illinois, Prairie Research Institute‘s Illinois Natural History Survey and also an active participant in the Society for the Preservation of Natural History Collections (SPNHC), the Entomological Collections Network (ECN), ICEDIG, the Research Data Alliance (RDA), and The Carpentries.

“Our TDWG mission is to create, maintain and promote the use of open, community-driven standards to enable sharing and use of biodiversity data for all,”

she added.
Prof Lyubomir Penev (Pensoft) and Deborah Paul (TDWG) at TDWG 2022.

“We are proud to have been selected to be the hosts of this year’s TDWG annual conference and are definitely happy to have joined and observed so many active experts network and share their know-how and future plans with each other, so that they can collaborate and make further progress in the way scientists and informaticians work with biodiversity information,”  

said Pensoft’s founder and CEO Prof. Lyubomir Penev.

“As a publisher of multiple globally renowned scientific journals and books in the field of biodiversity and ecology, at Pensoft we assume it to be our responsibility to be amongst the first to implement those standards and good practices, and serve as an example in the scholarly publishing world. Let me remind you that it is the scientific publications that present the most reliable knowledge the world and science has, due to the scrutiny and rigour in the review process they undergo before seeing the light of day,”

he added.

***

In a nutshell, the main task and dedication of the TDWG association is to develop and maintain standards and data-sharing protocols that support the infrastructures (e.g., The Global Biodiversity Information Facility – GBIF), which aggregate and facilitate use of these data, in order to inform and expand humanity’s knowledge about life on Earth.

It is the goal of everyone at TDWG to let scientists interested in the world’s biodiversity to do their work efficiently and in a manner that can be understood, shared and reused.

It is the goal of everyone volunteering their time and expertise to TDWG to enable the scientists interested in the world’s biodiversity to do their work efficiently and in a manner that can be understood, shared and reused by others. After all, biodiversity data underlie everything we know about the natural world.

If there are optimised and universal standards in the way researchers store and disseminate biodiversity data, all those biodiversity scientists will be able to find, access and use the knowledge in their own work much more easily. As a result, they will be much better positioned to contribute new knowledge that will later be used in nature and ecosystem conservation by key decision-makers.

On Monday, the event opened with welcoming speeches by Deborah Paul and Prof. Lyubomir Penev in their roles of the Chair of TDWG and the main host of this year’s conference, respectively.

The opening ceremony continued with a keynote speech by Prof. Pavel Stoev, Director of the Natural History Museum of Sofia and co-host of TDWG 2022. 

Prof. Pavel Stoev (Natural History Museum of Sofia) with a presentation about the known and unknown biodiversity of Bulgaria during the opening plenary session of TDWG 2022.

He walked the participants through the fascinating biodiversity of Bulgaria, but also the worrying trends in the country associated with declining taxonomic expertise. 

He finished his talk with a beam of hope by sharing about the recently established national unit of DiSSCo, whose aim – even if a tad too optimistic – is to digitise one million natural history items in four years, of which 250,000 with photographs. So far, one year into the project, the Bulgarian team has managed to digitise more than 32,000 specimens and provide images to 10,000 specimens.

The plenary session concluded with a keynote presentation by renowned ichthyologist and biodiversity data manager Dr. Richard L. Pyle, who is also a manager of ZooBank – the key international database for newly described species.

Keynote presentation by Dr Richard L. Pyle (Bishop Museum, USA) at the opening plenary session of TDWG 2022.

In his talk, he highlighted the gaps in the ways taxonomy is being used, thereby impeding biodiversity research and cutting off a lot of opportunities for timely scientific progress.

“There are simple things we can do to change how we use taxonomy as a tool that would dramatically improve our ability to conduct science and understand biodiversity. There is enormous value and utility within existing databases around the world to understand biodiversity, how threatened it is, what impacts human activity has (especially climate change), and how to optimise the protection and preservation of biodiversity,”

he said in an interview for a joint interview by the Bulgarian News Agency and Pensoft.

“But we do not have easy access to much of this information because the different databases are not well integrated. Taxonomy offers us the best opportunity to connect this information together, to answer important questions about biodiversity that we have never been able to answer before. The reason meetings like this are so important is that they bring people together to discuss ways of using modern informatics to greatly increase the power of the data we already have, and prioritise how we fill the gaps in data that exist. Taxonomy, and especially taxonomic data integration, is a very important part of the solution.”

Pyle also commented on the work in progress at ZooBank ten years into the platform’s existence and its role in the next (fifth) edition of the International Code of Zoological Nomenclature, which is currently being developed by the International Commission of Zoological Nomenclature (ICZN). 

“We already know that ZooBank will play a more important role in the next edition of the Code than it has for these past ten years, so this is exactly the right time to be planning new services for ZooBank. Improvements at ZooBank will include things like better user-interfaces on the web to make it easier and faster to use ZooBank, better data services to make it easier for publishers to add content to ZooBank as part of their publication workflow, additional information about nomenclature and taxonomy that will both support the next edition of the Code, and also help taxonomists get their jobs done more efficiently and effectively. Conferences like the TDWG one are critical for helping to define what the next version of ZooBank will look like, and what it will do.”

***

During the week, the conference participants had the opportunity to enjoy a total of 140 presentations; as well as multiple social activities, including a field trip to Rila Monastery and a traditional Bulgarian dinner.

TDWG 2022 conference participants document their species observations on their way to Rila Monastery.

While going about the conference venue and field trip localities, the attendees were also actively uploading their species observations made during their stay in Bulgaria on iNaturalist in a TDWG2022-dedicated BioBlitz. The challenge concluded with a total of 635 observations and 228 successfully identified species.

Amongst the social activities going on during TDWG 2022 was a BioBlitz, where the conference participants could uploade their observations made in Bulgaria on iNaturalist and help each other successfully identify the specimens.

***

In his interview for the Bulgarian News Agency and Pensoft, Dr Vincent Smith, Head of the Informatics Division at the Natural History Museum, London (United Kingdom), co-founder of DiSSCo, the Distributed System of Scientific Collections, and the Editor-in-Chief of Biodiversity Data Journal, commented: 

“Biodiversity provides the support systems for all life on Earth. Yet the natural world is in peril, and we face biodiversity and climate emergencies. The consequences of these include accelerating extinction, increased risk from zoonotic disease, degradation of natural capital, loss of sustainable livelihoods in many of the poorest yet most biodiverse countries of the world, challenges with food security, water scarcity and natural disasters, and the associated challenges of mass migration and social conflicts.

Solutions to these problems can be found in the data associated with natural science collections. DiSSCo is a partnership of the institutions that digitise their collections to harness their potential. By bringing them together in a distributed, interoperable research infrastructure, we are making them physically and digitally open, accessible, and usable for all forms of research and innovation. 

At present rates, digitising all of the UK collection – which holds more than 130 million specimens collected from across the globe and is being taken care of by over 90 institutions – is likely to take many decades, but new technologies like machine learning and computer vision are dramatically reducing the time it will take, and we are presently exploring how robotics can be applied to accelerate our work.”

Dr Vincent Smith, Head of the Informatics Division at the Natural History Museum, London, co-founder of DiSSCo, and Editor-in-Chief of Biodiversity Data Journal at the TDWG 2022 conference.

In his turn, Dr Donat Agosti, CEO and Managing director at Plazi – a not-for-profit organisation supporting and promoting the development of persistent and openly accessible digital taxonomic literature – said:

“All the data about biodiversity is in our libraries, that include over 500 million pages, and everyday new publications are being added. No person can read all this, but machines allow us to mine this huge, very rich source of data. We do not know how many species we know, because we cannot analyse with all the scientists in this library, nor can we follow new publications. Thus, we do not have the best possible information to explore and protect our biological environment.”

Dr Donat Agosti demonstrating the importance of publishing biodiversity data in a structured and semantically enhanced format in one of his presentations at TDWG 2022.

***

At the closing plenary session, Gail Kampmeier – TDWG Executive member and one of the first zoologists to join TDWG in 1996 – joined via Zoom to walk the conference attendees through the 37-year history of the association, originally named the Taxonomic Databases Working Group, but later transformed to Biodiversity Information Standards, as it expanded its activities to the whole range of biodiversity data. 

“While this presentation is about TDWG’s history as an organisation, its focus will be on the heart of TDWG: its people. We would like to show how the organisation has evolved in terms of gender balance, inclusivity actions, and our engagement to promote and enhance diversity at all levels. But more importantly, where do we—as a community—want to go in the future?”,

reads the conference abstract of her colleague at TDWG Dr Visotheary Ung (CNRS-MNHN) and herself.

Then, in the final talk of the session, Deborah Paul took to the stage to present the progress and key achievements by the association from 2022.

She gave a special shout-out to the TDWG journal: Biodiversity Information Science and Standards (BISS), where for the 6th consecutive year, the participants of the annual conference submitted and published their conference abstracts ahead of the event. 

Deborah Paul reminds that – apart from the conference abstracts – the TDWG journal: Biodiversity Information Science and Standards (BISS) also welcomes full-lenght articles that demonstrate the development or application of new methods and approaches in biodiversity informatics.

Launched in 2017 on the Pensoft’s publishing platform ARPHA, the journal provides the quite unique and innovative opportunity to have both abstracts and full-length research papers published in a modern, technologically-advanced scholarly journal. In her speech, Deborah Paul reminded that BISS journal welcomes research articles that demonstrate the development or application of new methods and approaches in biodiversity informatics in the form of case studies.

Amongst the achievements of TDWG and its community, a special place was reserved for the Horizon 2020-funded BiCIKL project (abbreviation for Biodiversity Community Integrated Knowledge Library), involving many of the association’s members. 

Having started in 2021, the 3-year project, coordinated by Pensoft, brings together 14 partnering institutions from 10 countries, and 15 biodiversity under the common goal to create a centralised place to connect all key biodiversity data by interlinking a total of 15 research infrastructures and their databases.

Deborah Paul also reported on the progress of the Horizon 2020-funded project BiCIKL, which involves many of the TDWG members. BiCIKL’s goal is to create a centralised place to connect all key biodiversity data by interlinking 15 key research infrastructures and their databases.

In fact, following the week-long TDWG 2022 conference in Sofia, a good many of the participants set off straight for another Bulgarian city and another event hosted by Pensoft. The Second General Assembly of BiCIKL took place between 22nd and 24th October in Plovdiv.

***

You can also explore highlights and live tweets from TDWG 2022 on Twitter via #TDWG2022.
The Pensoft team at TDWG 2022 were happy to become the hosts of the 37th TDWG conference.

‘Who is in your database and why does it matter?’

The uncertainty about a person’s identity hampers research, hinders the discovery of expertise, and obstructs the ability to give attribution or credit for work performed. 

Collection discovery through disambiguation

Guest blog post by Sabine von Mering, Heather Rogers, Siobhan Leachman, David P. ShorthouseDeborah Paul & Quentin Groom

Worldwide, natural history institutions house billions of physical objects in their collections, they create and maintain data about these items, and they share their data with aggregators such as the Global Biodiversity Information Facility (GBIF), the Integrated Digitized Biocollections (iDigBio), the Atlas of Living Australia (ALA), Genbank and the European Nucleotide Archive (ENA). 

Even though these data often include the names of the people who collected or identified each object, such statements may be ambiguous, as the names frequently lack any globally unique, machine-readable concept of their shared identity.

Despite the data being available online, barriers exist to effectively use the information about who collects or provides the expertise to identify the collection objects. People have similar names, change their name over the course of their lifetime (e.g. through marriage), or there may be variability introduced through the label transcription process itself (e.g. local look-up lists). 

As a result, researchers and collections staff often spend a lot of time deducing who is the person or people behind unknown collector strings while collating or tidying natural history data. The uncertainty about a person’s identity hampers research, hinders the discovery of expertise, and obstructs the ability to give attribution or credit for work performed. 

Disambiguation activities: the act of churning strings into verifiable things using all available evidence – need not be done in isolation. In addition to presenting a workflow on how to disambiguate people in collections, we also make the case that working in collaboration with colleagues and the general public presents new opportunities and introduces new efficiencies. There is tacit knowledge everywhere.

More often than not, data about people involved in biodiversity research are scattered across different digital platforms. However, with linking information sources to each other by using person identifiers, we can better trace the connections in these networks, so that we can weave a more interoperable narrative about every actor.

That said, inconsistent naming conventions or lack of adequate accreditation often frustrate the realization of this vision. This sliver of natural history could be churned to gold with modest improvements in long-term funding for human resources, adjustments to digital infrastructure, space for the physical objects themselves alongside their associated documents, and sufficient training on how to disambiguate people’s names.

“He aha te mea nui o te ao. He tāngata, he tāngata, he tāngata.

“What is the most important thing in the world? It is people, it is people, it is people.”

(Māori proverb)

The process of properly disambiguating those who have contributed to natural history collections takes time. 

The disambiguation process involves the extra challenge of trying to deduce “who is who” for legacy data, compared to undertaking this activity for people alive today. Retrospective disambiguation can require considerable detective work, especially for scarcely known people or if the community has a different naming convention. Provided the results of this effort are well-communicated and openly shared, mercifully, it need only be done once.

At the core of our research is the question of how to solve the issue of assigning proper credit

In our recent Methods paper, we discuss several methods for this, as well as available routes for making records available online that include not only the names of people expressed as text, but additionally twinned with their unique, resolvable identifiers. 

Disambiguation is a cycle. Enrichment of the data feeds off itself leading to further disambiguation. As more names are disambiguated and more biographical data are accumulated, it becomes easier to disambiguate more names. 

First and foremost, we should maintain our own public biographical data by making full use of ORCID. In addition to preserving our own scientific legacy and that of the institutions that employ us, we have a responsibility to avoid generating unnecessary disambiguation work for others. 

For legacy data, where the people connected to the collections are deceased, Wikidata can be used to openly document rich bibliographic and demographic data, each statement with one or more verifiable references. Wikidata can also act as a bridge to link other sources of authority such as VIAF or ORCID identifiers. It has many tools and services to bulk import, export, and to query information, making it well-suited as a universal democratiser of information about people often walled-off in collection management systems (CMS). 

A network of the top twenty most used identifiers for biologists on Wikidata.

Once unique identifiers for people are integrated in collection management systems, these may be shared with the global collections and research community using the new Darwin Core terms, recordedByID or identifiedByID along with the well-known, yet text-based terms, recordedBy or identifiedBy. 

Approximately 120 datasets published through GBIF now make use of these identifier-based terms, which are additionally resolved in Bionomia every few weeks alongside co-curated attributions newly made there. This roundtrip of data – emerging as ambiguous strings of text from the source, affixed with resolvable identifiers elsewhere, absorbed into the source as new digital annotations, and then re-emerging with these fresh, identifier-based enhancements – is an exciting approach to co-manage collections data.

Round tripping. In Bionomia, people identifiers from Wikidata and ORCID are used to enrich data published via GBIF, thus linking natural history specimens to the world’s collectors.

Disambiguation work is particularly important in recognising contributors who have been historically marginalized. For example, gender bias in specimen data can be seen in the case of Wilmatte Porter Cockerell, a prolific collector of botanical, entomological and fossil specimens. Cockerell’s collections are often attributed to her husband as he was also a prolific collector and the two frequently collected together. 

On some labels, her identity is further obscured as she is simply recorded as “& wife” (see example on GBIF). Since Wilmatte Cockerell was her husband’s second wife, it can take some effort to confirm if a specimen can be attributed to her and not her husband’s first wife, who was also involved in collecting specimens. By ensuring that Cockerell is disambiguated and her contributions are appropriately attributed, the impact of her work becomes more visible enabling her work to be properly and fairly credited.

Thus, disambiguation work helps to not only give credit where credit is due, thereby making data about people and their biodiversity collections more findable, but it also creates an inclusive and representative narrative of the landscape of people involved with scientific knowledge creation, identification, and preservation. 

A future – once thought to be a dream – where the complete scientific output of a person is connected as Linked Open Data (LOD) is now

Both the tools and infrastructure are at our disposal and the demand is palpable. All institutions can contribute to this movement by sharing data that include unique identifiers for the people in their collections. We recommend that institutions develop a strategy, perhaps starting with employees and curatorial staff, people of local significance, or those who have been marginalized, and to additionally capitalize on existing disambiguation activities elsewhere. This will have local utility and will make a significant, long-term impact. 

The more we participate in these activities, the greater chance we will uncover positive feedback loops, which will act to lighten the workload for all involved, including our future selves!

The disambiguation of people in collections is an ongoing process, but it becomes easier with practice. We also encourage collections staff to consider modifying their existing workflows and policies to include identifiers for people at the outset, when new data are generated or when new specimens are acquired. 

There is more work required at the global level to define, update, and ratify standards and best practices to help accelerate data exchange or roundtrips of this information; there is room for all contributions. Thankfully, there is a diverse, welcoming, energetic, and international community involved in these activities. 

We see a bright future for you, our collections, and our research products – well within reach – when the identities of people play a pivotal role in the construction of a knowledge graph of life.

You would like to participate and need support getting disambiguation of your collection started? Please contact our TDWG People in Biodiversity Data Task Group.

A good start is also to check Bionomia to find out what metrics exist now for your institution or collection and affiliated people.

The next steps for collections: 7 objectives that can help to disambiguate your institutions’ collection:

1. Promote the use of person identifiers in local, national or international outreach, publishing and research activities

2. Increase the number of collection management systems that use person identifiers

3. Increase the number of living collectors registered and using an ORCID identifier when contributing to collections

4. Undertake disambiguation in the national languages of many countries

5. Increase the number of identified people on Wikidata linked to collections

6. Increase the number of people in collections with expertise in person disambiguation

7. Collaborate towards an exchange standard for attribution data

A real example of how a name string is disambiguated and the steps taken in documenting it. Wikidata item of Jean-André Soulié

***

Methods publication:

Groom Q, Bräuchler C, Cubey RWN, Dillen M, Huybrechts P, Kearney N, Klazenga N, Leachman S, Paul DL, Rogers H, Santos J, Shorthouse DP, Vaughan A, von Mering S, Haston EM (2022) The disambiguation of people names in biological collections. Biodiversity Data Journal 10: e86089. https://doi.org/10.3897/BDJ.10.e86089

***

Follow Biodiversity Data Journal on Twitter and Facebook.

Scientists conceptualize a species ‘stock market’ to put a price tag on actions posing risks to biodiversity

“…the most realistic and tangible way out of the looming biodiversity crisis is to put a price tag on species and thereby a cost to actions that compromise them.”

So far, science has described more than 2 million species, and millions more await discovery. While species have value in themselves, many also deliver important ecosystem services to humanity, such as insects that pollinate our crops. 

Meanwhile, as we lack a standardized system to quantify the value of different species, it is too easy to jump to the conclusion that they are practically worthless. As a result, humanity has been quick to justify actions that diminish populations and even imperil biodiversity at large.

In a study, published in the scholarly open-science journal Research Ideas and Outcomes, a team of Estonian and Swedish scientists propose to formalize the value of all species through a conceptual species ‘stock market’ (SSM). Much like the regular stock market, the SSM is to act as a unified basis for instantaneous valuation of all items in its holdings.

However, other aspects of the SSM would be starkly different from the regular stock market. Ownership, transactions, and trading will take new forms. Indeed, species have no owners, and ‘trade’ would not be about transfer of ownership rights among shareholders. Instead, the concept of ‘selling’ would comprise processes that erase species from some specific area – such as war, deforestation, or pollution.

“The SSM would be able to put a price tag on such transactions, and the price could be thought of as an invoice that the seller needs to settle in some way that benefits global biodiversity,”

explains the study’s lead author Prof. Urmas Kõljalg (University of Tartu, Estonia).

Conversely, taking some action that benefits biodiversity – as estimated through individuals of species – would be akin to buying on the species stock market. Buying, too, has a price tag on it, but this price should probably be thought of in goodwill terms. Here, ‘money’ represents an investment towards increased biodiversity. 

“By rooting such actions in a unified valuation system it is hoped that goodwill actions will become increasingly difficult to dodge and dismiss,”

adds Kõljalg.

Interestingly, the SSM revolves around the notion of digital species. These are representations of described and undescribed species concluded to exist based on DNA sequences and elaborated by including all we know about their habitat, ecology, distribution, interactions with other species, and functional traits. 

For the SSM to function as described, those DNA sequences and metadata need to be sourced from global scientific and societal resources, including natural history collections, sequence databases, and life science data portals. Digital species might be managed further by incorporating data records of non-sequenced individuals, notably observations, older material in collections, and data from publications.

The study proposes that the SSM is orchestrated by the international associations of taxonomists and economists. 

“Non-trivial complications are foreseen when implementing the SSM in practice, but we argue that the most realistic and tangible way out of the looming biodiversity crisis is to put a price tag on species and thereby a cost to actions that compromise them,”

says Kõljalg.

“No human being will make direct monetary profit out of the SSM, and yet it’s all Earth’s inhabitants – including humans – that could benefit from its pointers.”

Original source

Kõljalg U, Nilsson RH, Jansson AT, Zirk A, Abarenkov K (2022) A price tag on species. Research Ideas and Outcomes 8: e86741. https://doi.org/10.3897/rio.8.e86741

***

Follow RIO Journal on Twitter and Facebook.

Call for Expression of Interest for biodiversity data-related scientific projects from BiCIKL

The purpose of this call is to solicit, select and implement four to six biodiversity data-related scientific projects that will make use of the added value services developed by the leading Research Infrastructures that make the BiCIKL project.

The BiCIKL project invites submissions of Expression of Interest (EoI) to the First BiCIKL Open Call for projects. The purpose of this call is to solicit, select and implement four to six biodiversity data-related scientific projects that will make use of the added value services developed by the leading Research Infrastructures that make the BiCIKL project.

By opening this call, BiCIKL aims to better understand how it could support scientific questions that arise from across the biodiversity world in the future, while addressing specific scientific or technical biodiversity data challenges presented by the applicants.

We need and want to assess real-world problems and make the best possible use of our data and technical capabilities. This will greatly assist in defining the long-term development goals of the participating Research Infrastructures and improve the way they can technically and operationally work together to deliver greater scientific value.

explain the project partners.

The BiCIKL project – a Horizon 2020-funded project involving 14 European institutions, representing major global players in biodiversity research and natural history, and coordinated by Pensoft – establishes a European starting community of key research infrastructures, researchers, citizen scientists and other biodiversity and life sciences stakeholders based on open science practices through access to data, tools and services.

Find more about the Call and submit your Expression of Interest

***

Follow BiCIKL on Twitter and Facebook.

Join the conversation on Twitter via #BiCIKL_H2020.

This October, the hybrid TDWG 2022 conference will address standards for linking biodiversity data

From 17th to 21st October 2022, the Biodiversity Information Standards (TDWG) conference – to be held in Sofia – will run under the theme “Stronger Together: Standards for linking biodiversity data”.

Between 17th and 21st October 2022, the Biodiversity Information Standards (TDWG) conference – to be held in Sofia, Bulgaria – will run under the theme “Stronger Together: Standards for linking biodiversity data”.

The event will be hosted by scholarly publisher and technology provider Pensoft, in collaboration with the National Museum of Natural History, and the Institute of Biodiversity and Ecosystem Research at the Bulgarian Academy of Sciences. This year, the event will be welcoming participants in-person, as well as virtually.

In addition to opening and closing plenaries, the conference will feature 14 symposia and a mix of other formats that include lightning talks, a workshop, and panel discussion, and contributed oral presentations and virtual posters. 

For a seventh year in a row, all abstracts submitted to the annual conference are made publicly available in the dedicated TDWG journal: Biodiversity Information Science and Standards (BISS Journal).

Thus, the abstracts – published ahead of the event itself – are not only permanently and freely available in a ‘mini-paper’ format, but will also provide conference participants with a sneak peek into what’s coming at the much anticipated conference.

Learn more about the unique features of BISS.

***

Register and find more about the TDWG 2022 conference on Pensoft Event Manager.

See the Call for Abstracts and learn how to submit your abstract today.

Visit the TDWG conference website.

***

Ahead, during and after the conference, join the conversation on Twitter via #tdwg2022.

Don’t forget to also follow TDWG (Twitter and Facebook), BISS Journal (Twitter and Facebook) and Pensoft (Twitter and Facebook) on social media.

Call for data papers describing datasets from Northern Eurasia in Biodiversity Data Journal

In collaboration with the Finnish Biodiversity Information Facility (FinBIF) and Pensoft Publishers, GBIF has announced a new call for authors to submit and publish data papers on Russia in a special collection of Biodiversity Data Journal (BDJ). The call extends and expands upon a successful effort in 2020 to mobilize data from European Russia.

GBIF partners with FinBIF and Pensoft’s Biodiversity Data Journal to streamline publication of new datasets about biodiversity from Northern Eurasia

Original post via GBIF

In collaboration with the Finnish Biodiversity Information Facility (FinBIF) and Pensoft Publishers, GBIF has announced a new call for authors to submit and publish data papers on Northern Eurasia in a special collection of Biodiversity Data Journal (BDJ). The call expands upon successful efforts to mobilize data from European Russia in 2020 and from the rest of Russia in 2021.

Until 30 June 2022, Pensoft will waive the article processing fee (normally €650) for the first 50 accepted data paper manuscripts that meet the following criteria for describing a dataset:

See the complete definition of these terms below.

Detailed instructions

Authors must prepare the manuscript in English and submit it in accordance with BDJ’s instructions to authors by 30 June 2022. Late submissions will not be eligible for APC waivers.

Sponsorship is limited to the first 50 accepted submissions meeting these criteria on a first-come, first-served basis. The call for submissions can therefore close prior to the deadline of 30 June 2022. Authors may contribute to more than one manuscript, but artificial division of the logically uniform data and data stories, or “salami publishing”, is not allowed.

BDJ will publish a special issue including the selected papers by the end of 2021. The journal is indexed by Web of Science (Impact Factor 1.225), Scopus (CiteScore: 2.0) and listed in РИНЦ / eLibrary.ru.

For non-native speakers, please ensure that your English is checked either by native speakers or by professional English-language editors prior to submission. You may credit these individuals as a “Contributor” through the AWT interface. Contributors are not listed as co-authors but can help you improve your manuscripts. BDJ will introduce stricter language checks for the 2022 call; poorly written submissions will be rejected prior to the peer-review process.

In addition to the BDJ instruction to authors, data papers must referenced the dataset by
a) citing the dataset’s DOI
b) appearing in the paper’s list of references
c) including “Northern Eurasia 2022” in the Project Data: Title and “N-Eurasia-2022“ in Project Data: Identifier in the dataset’s metadata.

Authors should explore the GBIF.org section on data papers and Strategies and guidelines for scholarly publishing of biodiversity data. Manuscripts and datasets will go through a standard peer-review process. When submitting a manuscript to BDJ, authors are requested to assign their manuscript to the Topical Collection: Biota of Northern Eurasia at step 3 of the submission process. To initiate the manuscript submission, remember to press the Submit to the journal button.

To see an example, view this dataset on GBIF.org and the corresponding data paper published by BDJ.

Questions may be directed either to Dmitry Schigel, GBIF scientific officer, or Yasen Mutafchiev, managing editor of Biodiversity Data Journal.

This project is a continuation of successful calls for data papers from European Russia in 2020 and 2021. The funded papers are available in the Biota of Russia special collection and the datasets are shown on the project page.

Definition of terms

Datasets with more than 7,000 presence records new to GBIF.org

Datasets should contain at a minimum 7,000 presence records new to GBIF.org. While the focus is on additional records for the region, records already published in GBIF may meet the criteria of ‘new’ if they are substantially improved, particularly through the addition of georeferenced locations.” Artificial reduction of records from otherwise uniform datasets to the necessary minimum (“salami publishing”) is discouraged and may result in rejection of the manuscript. New submissions describing updates of datasets, already presented in earlier published data papers will not be sponsored.

Justification for publishing datasets with fewer records (e.g. sampling-event datasets, sequence-based data, checklists with endemics etc.) will be considered on a case-by-case basis.

Datasets with high-quality data and metadata

Authors should start by publishing a dataset comprised of data and metadata that meets GBIF’s stated data quality requirement. This effort will involve work on an installation of the GBIF Integrated Publishing ToolkitBDJ will conduct its standard data audit and technical review. All datasets must pass the data audit prior to a manuscript being forwarded for peer review.

Only when the dataset is prepared should authors then turn to working on the manuscript text. The extended metadata you enter in the IPT while describing your dataset can be converted into manuscript with a single-click of a button in the ARPHA Writing Tool (see also Creation and Publication of Data Papers from Ecological Metadata Language (EML) Metadata. Authors can then complete, edit and submit manuscripts to BDJ for review.

Datasets with geographic coverage in Northern Eurasia

In correspondence with the funding priorities of this programme, at least 80% of the records in a dataset should have coordinates that fall within the priority areas of Russia, Ukraine, Belarus, Kazakhstan, Kyrgyzstan, Uzbekistan, Tajikistan, Turkmenistan, Moldova, Georgia, Armenia and Azerbaijan. However, authors of the paper may be affiliated with institutions anywhere in the world.

***

Follow Biodiversity Data Journal on Twitter and Facebook to keep yourself posted about the new research published.

Digitising the Natural History Museum London’s entire collection could contribute over £2 billion to the global economy

In a world first, the Natural History Museum, London, has collaborated with economic consultants, Frontier Economics Ltd, to explore the economic and societal value of digitising natural history collections and concluded that digitisation has the potential to see a seven to tenfold return on investment. Whilst significant progress is already being made at the Museum, additional investment is needed in order to unlock the full potential of the Museum’s vast collections – more than 80 million objects. The project’s report is published in the open science scientific journal Research Ideas and Outcomes (RIO Journal).

One of the Museum’s digitisers imaging a butterfly to join the 4.93 million specimens already available online. 
© The Trustees of the Natural History Museum, London

The societal benefits of digitising natural history collections extends to global advancements in food security, biodiversity conservation, medicine discovery, minerals exploration, and beyond. Brand new, rigorous economic report predicts investing in digitising natural history museum collections could also result in a tenfold return. The Natural History Museum, London, has so far made over 4.9 million digitised specimens available freely online – over 28 billion records have been downloaded over 429,000 download events over the past six years. 

Digitisation at the Natural History Museum, London 

Digitisation is the process of creating and sharing the data associated with Museum specimens. To digitise a specimen, all its related information is added to an online database. This typically includes where and when it was collected and who found it, and can include photographs, scans and other molecular data if available. Natural history collections are a unique record of biodiversity dating back hundreds of years, and geodiversity dating back millennia. Creating and sharing data this way enables science that would have otherwise been impossible, and we accelerate the rate at which important discoveries are made from our collections.  

The Natural History Museum’s collection of 80 million items is one of the largest and most historically and geographically diverse in the world. By unlocking the collection online, the Museum provides free and open access for global researchers, scientists, artists and more. Since 2015, the Museum has made 4.9 million specimens available on the Museum’s Data Portal, which have seen more than 28 billion downloads over 427,000 download events. 

This means the Museum has digitised  about 6% of its collections to date. Because digitisation is expensive, costing tens of millions of pounds, it is difficult to make a case for further investment without better understanding the value of this digitisation and its benefits. 

In 2021, the Museum decided to explore the economic impacts of collections data in more depth, and commissioned Frontier Economics to undertake modelling, resulting in this project report, now made publicly available in the open-science journal Research Ideas and Outcomes (RIO Journal), and confirming benefits in excess of £2 billion over 30 years. While the methods in this report are relevant to collections globally, this modelling focuses on benefits to the UK, and is intended to support the Museum’s own digitisation work, as well as a current scoping study funded by the Arts & Humanities Research Council about the case for digitising all UK natural science collections as a research infrastructure.

Sharing data from our collections can transform scientific research and help find solutions for nature and from nature. Our digitised collections have helped establish the baseline plant biodiversity in the Amazon, find wheat crops that are more resilient to climate change and support research into potential zoonotic origins of Covid-19. The research that comes from sharing our specimens has immense potential to transform our world and help both people and the planet thrive,

says Helen Hardy, Science Digital Programme Manager at the Natural History Museum.

How digitisation impacts scientific research?

The data from museum collections accelerates scientific research, which in turn creates benefits for society and the economy across a wide range of sectors. Frontier Economics Ltd have looked at the impact of collections data in five of these sectors: biodiversity conservation, invasive species, medicines discovery, agricultural research and development and mineral exploration. 

The Natural History Museum’s collection is a real treasure trove which, if made easily accessible to scientists all over the world through digitisation, has the potential to unlock ground-breaking research in any number of areas. Predicting exactly how the data will be used in future is clearly very uncertain. We have looked at the potential value that new research could create in just five areas focussing on a relatively narrow set of outcomes. We find that the value at stake is extremely large, running into billions,”

says Dan Popov, Economist at Frontier Economics Ltd.

The new analyses attempt to estimate the economic value of these benefits using a range of approaches, with the results in broad agreement that the benefits of digitisation are at least ten times greater than the costs. This represents a compelling case for investment in museum digital infrastructure without which the many benefits will not be realised.

This new analysis shows that the data locked up in our collections has significant societal and economic value, but we need investment to help us release it,

adds Professor Ken Norris, Head of the Life Sciences Department at the Natural History Museum.

Other benefits could include improvements to the resilience of agricultural crops by better understanding their wild relatives, research into invasive species which can cause significant damage to ecosystems and crops, and improving the accuracy of mining.  

Finally, there are other impacts that such work could have on how science is conducted itself. The very act of digitising specimens means that researchers anywhere on the planet can access these collections, saving time and money that may have been spent as scientists travelled to see specific objects.

The value of research enabled by digitisation of natural history collections can be estimated by looking at specific areas where the Museum’s collections contribute towards scientific research and subsequently impact the wider economy. 
© Frontier Economics Ltd.

Original source: 

Popov D, Roychoudhury P, Hardy H, Livermore L, Norris K (2021) The Value of Digitising Natural History Collections. Research Ideas and Outcomes 7: e78844. https://doi.org/10.3897/rio.7.e78844

New BiCIKL project to build a freeway between pieces of biodiversity knowledge

Within Biodiversity Community Integrated Knowledge Library (BiCIKL), 14 key research and natural history institutions commit to link infrastructures and technologies to provide flawless access to biodiversity data.

In a recently started Horizon 2020-funded project, 14 European institutions from 10 countries, representing both the continent’s and global key players in biodiversity research and natural history, deploy and improve their own and partnering infrastructures to bridge gaps between each other’s biodiversity data types and classes. By linking their technologies, they are set to provide flawless access to data across all stages of the research cycle.

Three years in, BiCIKL (abbreviation for Biodiversity Community Integrated Knowledge Library) will have created the first-of-its-kind Biodiversity Knowledge Hub, where a researcher will be able to retrieve a full set of linked and open biodiversity data, thereby accessing the complete story behind an organism of interest: its name, genetics, occurrences, natural history, as well as authors and publications mentioning any of those.

Ultimately, the project’s products will solidify Open Science and FAIR (Findable, Accessible, Interoperable and Reusable) data practices by empowering and streamlining biodiversity research.

Together, the project partners will redesign the way biodiversity data is found, linked, integrated and re-used across the research cycle. By the end of the project, BiCIKL will provide the community with a more transparent, trustworthy and efficient highly automated research ecosystem, allowing for scientists to access, explore and put into further use a wide range of data with only a few clicks.

“In recent years, we’ve made huge progress on how biodiversity data is located, accessed, shared, extracted and preserved, thanks to a vast array of digital platforms, tools and projects looking after the different types of data, such as natural history specimens, species descriptions, images, occurrence records and genomics data, to name a few. However, we’re still missing an interconnected and user-friendly environment to pull all those pieces of knowledge together. Within BiCIKL, we all agree that it’s only after we puzzle out how to best bridge our existing infrastructures and the information they are continuously sourcing that future researchers will be able to realise their full potential,” 

explains BiCIKL’s project coordinator Prof. Lyubomir Penev, CEO and founder of Pensoft, a scholarly publisher and technology provider company.

Continuously fed with data sourced by the partnering institutions and their infrastructures, BiCIKL’s key final output: the Biodiversity Knowledge Hub, is set to persist with time long after the project has concluded. On the contrary, by accelerating biodiversity research that builds on – rather than duplicates – existing knowledge, it will in fact be providing access to exponentially growing contextualised biodiversity data.

***

Learn more about BiCIKL on the project’s website at: bicikl-project.eu

Follow BiCIKL Project on Twitter and Facebook. Join the conversation on Twitter at #BiCIKL_H2020.

***

The project partners:

One water bucket to find them all: Detecting fish, mammals, and birds from a single sample

Revolutionary environmental DNA analysis holds great potential for the future of biodiversity monitoring, concludes a new study.

Revolutionary environmental DNA analysis holds great potential for the future of biodiversity monitoring, concludes a new study

Collection of water samples for eDNA metabarcoding bioassessment.
Photo by Till-Hendrik Macher.

In times of exacerbating biodiversity loss, reliable data on species occurrence are essential, in order for prompt and adequate conservation actions to be initiated. This is especially true for freshwater ecosystems, which are particularly vulnerable and threatened by anthropogenic impacts. Their ecological status has already been highlighted as a top priority by multiple national and international directives, such as the European Water Framework Directive.

However, traditional monitoring methods, such as electrofishing, trapping methods, or observation-based assessments, which are the current status-quo in fish monitoring, are often time- and cost-consuming. As a result, over the last decade, scientists progressively agree that we need a more comprehensive and holistic method to assess freshwater biodiversity.

Meanwhile, recent studies have continuously been demonstrating that eDNA metabarcoding analyses, where DNA traces found in the water are used to identify what organisms live there, is an efficient method to capture aquatic biodiversity in a fast, reliable, non-invasive and relatively low-cost manner. In such metabarcoding studies, scientists sample, collect and sequence DNA, so that they can compare it with existing databases and identify the source organisms.

Furthermore, as eDNA metabarcoding assessments use samples from water, often streams, located at the lowest point, one such sample usually contains not only traces of specimens that come into direct contact with water, for example, by swimming or drinking, but also collects traces of terrestrial species indirectly via rainfalls, snowmelt, groundwaters etc. 

In standard fish eDNA metabarcoding assessments, these ‘bycatch data’ are typically left aside. Yet, from a viewpoint of a more holistic biodiversity monitoring, they hold immense potential to also detect the presence of terrestrial and semi-terrestrial species in the catchment.

In their new study, reported in the open-access scholarly journal Metabarcoding and MetagenomicsGerman researchers from the University of Duisburg-Essen and the German Environment Agency successfully detected an astonishing quantity of the local mammals and birds native to the Saxony-Anhalt state by collecting as much as 18 litres of water from across a two-kilometre stretch along the river Mulde.

After water filtration the eDNA filter is preserved in ethanol until further processing in the lab.
Photo by Till-Hendrik Macher.

In fact, it took only one day for the team, led by Till-Hendrik Macher, PhD student in the German Federal Environmental Agency-funded GeDNA project, to collect the samples. Using metabarcoding to analyse the DNA from the samples, the researchers identified as much as 50% of the fishes, 22% of the mammal species, and 7.4% of the breeding bird species in the region. 

However, the team also concluded that while it would normally take only 10 litres of water to assess the aquatic and semi-terrestrial fauna, terrestrial species required significantly more sampling.

Unlocking data from the increasingly available fish eDNA metabarcoding information enables synergies among terrestrial and aquatic biodiversity monitoring programs, adding further important information on species diversity in space and time. 

“We thus encourage to exploit fish eDNA metabarcoding biodiversity monitoring data to inform other conservation programs,”

says lead author Till-Hendrik Macher. 

“For that purpose, however, it is essential that eDNA data is jointly stored and accessible for different biodiversity monitoring and biodiversity assessment campaigns, either at state, federal, or international level,”

concludes Florian Leese, who coordinates the project.

Original source:

Macher T-H, Schütz R, Arle J, Beermann AJ, Koschorreck J, Leese F (2021) Beyond fish eDNA metabarcoding: Field replicates disproportionately improve the detection of stream associated vertebrate species. Metabarcoding and Metagenomics 5: e66557. https://doi.org/10.3897/mbmg.5.66557