This October, the hybrid TDWG 2022 conference will address standards for linking biodiversity data

From 17th to 21st October 2022, the Biodiversity Information Standards (TDWG) conference – to be held in Sofia – will run under the theme “Stronger Together: Standards for linking biodiversity data”.

Between 17th and 21st October 2022, the Biodiversity Information Standards (TDWG) conference – to be held in Sofia, Bulgaria – will run under the theme “Stronger Together: Standards for linking biodiversity data” and will be hosted by scholarly publisher and technology provider Pensoft, in collaboration with the National Museum of Natural History, and the Institute of Biodiversity and Ecosystem Research at the Bulgarian Academy of Sciences. This year, the event will be welcoming participants in-person, as well as virtually.

In addition to opening and closing plenaries, the conference will feature 14 symposia and a mix of other formats that include lightning talks, a workshop, and panel discussion as well as contributed oral presentations and virtual posters.

Both registrations and abstracts are already welcome. 

For a seventh year in a row, all abstracts submitted to the annual conference will be published in the dedicated TDWG journal: Biodiversity Information Science and Standards (BISS Journal). Thus, the abstracts – to be published ahead of the event itself – will not only become permanently and freely available in a ‘mini-paper’ format, but will also provide conference participants with a sneak peek into what’s coming at the much anticipated conference. Learn more about the unique features of BISS.

***

Important dates:

Abstract submissions accepted until 1 July 2022.

Early-bird registration available until 15 July 2022.

***

Register and find more about the TDWG 2022 conference on Pensoft Event Manager.

See the Call for Abstracts and learn how to submit your abstract today.

Visit the TDWG conference website.

***

Ahead, during and after the conference, join the conversation on Twitter via #tdwg2022.

Don’t forget to also follow TDWG (Twitter and Facebook), BISS Journal (Twitter and Facebook) and Pensoft (Twitter and Facebook) on social media.

Call for data papers describing datasets from Northern Eurasia in Biodiversity Data Journal

In collaboration with the Finnish Biodiversity Information Facility (FinBIF) and Pensoft Publishers, GBIF has announced a new call for authors to submit and publish data papers on Russia in a special collection of Biodiversity Data Journal (BDJ). The call extends and expands upon a successful effort in 2020 to mobilize data from European Russia.

GBIF partners with FinBIF and Pensoft’s Biodiversity Data Journal to streamline publication of new datasets about biodiversity from Northern Eurasia

Original post via GBIF

In collaboration with the Finnish Biodiversity Information Facility (FinBIF) and Pensoft Publishers, GBIF has announced a new call for authors to submit and publish data papers on Northern Eurasia in a special collection of Biodiversity Data Journal (BDJ). The call expands upon successful efforts to mobilize data from European Russia in 2020 and from the rest of Russia in 2021.

Until 30 June 2022, Pensoft will waive the article processing fee (normally €650) for the first 50 accepted data paper manuscripts that meet the following criteria for describing a dataset:

See the complete definition of these terms below.

Detailed instructions

Authors must prepare the manuscript in English and submit it in accordance with BDJ’s instructions to authors by 30 June 2022. Late submissions will not be eligible for APC waivers.

Sponsorship is limited to the first 50 accepted submissions meeting these criteria on a first-come, first-served basis. The call for submissions can therefore close prior to the deadline of 30 June 2022. Authors may contribute to more than one manuscript, but artificial division of the logically uniform data and data stories, or “salami publishing”, is not allowed.

BDJ will publish a special issue including the selected papers by the end of 2021. The journal is indexed by Web of Science (Impact Factor 1.225), Scopus (CiteScore: 2.0) and listed in РИНЦ / eLibrary.ru.

For non-native speakers, please ensure that your English is checked either by native speakers or by professional English-language editors prior to submission. You may credit these individuals as a “Contributor” through the AWT interface. Contributors are not listed as co-authors but can help you improve your manuscripts. BDJ will introduce stricter language checks for the 2022 call; poorly written submissions will be rejected prior to the peer-review process.

In addition to the BDJ instruction to authors, data papers must referenced the dataset by
a) citing the dataset’s DOI
b) appearing in the paper’s list of references
c) including “Northern Eurasia 2022” in the Project Data: Title and “N-Eurasia-2022“ in Project Data: Identifier in the dataset’s metadata.

Authors should explore the GBIF.org section on data papers and Strategies and guidelines for scholarly publishing of biodiversity data. Manuscripts and datasets will go through a standard peer-review process. When submitting a manuscript to BDJ, authors are requested to assign their manuscript to the Topical Collection: Biota of Northern Eurasia at step 3 of the submission process. To initiate the manuscript submission, remember to press the Submit to the journal button.

To see an example, view this dataset on GBIF.org and the corresponding data paper published by BDJ.

Questions may be directed either to Dmitry Schigel, GBIF scientific officer, or Yasen Mutafchiev, managing editor of Biodiversity Data Journal.

This project is a continuation of successful calls for data papers from European Russia in 2020 and 2021. The funded papers are available in the Biota of Russia special collection and the datasets are shown on the project page.

Definition of terms

Datasets with more than 7,000 presence records new to GBIF.org

Datasets should contain at a minimum 7,000 presence records new to GBIF.org. While the focus is on additional records for the region, records already published in GBIF may meet the criteria of ‘new’ if they are substantially improved, particularly through the addition of georeferenced locations.” Artificial reduction of records from otherwise uniform datasets to the necessary minimum (“salami publishing”) is discouraged and may result in rejection of the manuscript. New submissions describing updates of datasets, already presented in earlier published data papers will not be sponsored.

Justification for publishing datasets with fewer records (e.g. sampling-event datasets, sequence-based data, checklists with endemics etc.) will be considered on a case-by-case basis.

Datasets with high-quality data and metadata

Authors should start by publishing a dataset comprised of data and metadata that meets GBIF’s stated data quality requirement. This effort will involve work on an installation of the GBIF Integrated Publishing ToolkitBDJ will conduct its standard data audit and technical review. All datasets must pass the data audit prior to a manuscript being forwarded for peer review.

Only when the dataset is prepared should authors then turn to working on the manuscript text. The extended metadata you enter in the IPT while describing your dataset can be converted into manuscript with a single-click of a button in the ARPHA Writing Tool (see also Creation and Publication of Data Papers from Ecological Metadata Language (EML) Metadata. Authors can then complete, edit and submit manuscripts to BDJ for review.

Datasets with geographic coverage in Northern Eurasia

In correspondence with the funding priorities of this programme, at least 80% of the records in a dataset should have coordinates that fall within the priority areas of Russia, Ukraine, Belarus, Kazakhstan, Kyrgyzstan, Uzbekistan, Tajikistan, Turkmenistan, Moldova, Georgia, Armenia and Azerbaijan. However, authors of the paper may be affiliated with institutions anywhere in the world.

***

Follow Biodiversity Data Journal on Twitter and Facebook to keep yourself posted about the new research published.

Digitising the Natural History Museum London’s entire collection could contribute over £2 billion to the global economy

In a world first, the Natural History Museum, London, has collaborated with economic consultants, Frontier Economics Ltd, to explore the economic and societal value of digitising natural history collections and concluded that digitisation has the potential to see a seven to tenfold return on investment. Whilst significant progress is already being made at the Museum, additional investment is needed in order to unlock the full potential of the Museum’s vast collections – more than 80 million objects. The project’s report is published in the open science scientific journal Research Ideas and Outcomes (RIO Journal).

One of the Museum’s digitisers imaging a butterfly to join the 4.93 million specimens already available online. 
© The Trustees of the Natural History Museum, London

The societal benefits of digitising natural history collections extends to global advancements in food security, biodiversity conservation, medicine discovery, minerals exploration, and beyond. Brand new, rigorous economic report predicts investing in digitising natural history museum collections could also result in a tenfold return. The Natural History Museum, London, has so far made over 4.9 million digitised specimens available freely online – over 28 billion records have been downloaded over 429,000 download events over the past six years. 

Digitisation at the Natural History Museum, London 

Digitisation is the process of creating and sharing the data associated with Museum specimens. To digitise a specimen, all its related information is added to an online database. This typically includes where and when it was collected and who found it, and can include photographs, scans and other molecular data if available. Natural history collections are a unique record of biodiversity dating back hundreds of years, and geodiversity dating back millennia. Creating and sharing data this way enables science that would have otherwise been impossible, and we accelerate the rate at which important discoveries are made from our collections.  

The Natural History Museum’s collection of 80 million items is one of the largest and most historically and geographically diverse in the world. By unlocking the collection online, the Museum provides free and open access for global researchers, scientists, artists and more. Since 2015, the Museum has made 4.9 million specimens available on the Museum’s Data Portal, which have seen more than 28 billion downloads over 427,000 download events. 

This means the Museum has digitised  about 6% of its collections to date. Because digitisation is expensive, costing tens of millions of pounds, it is difficult to make a case for further investment without better understanding the value of this digitisation and its benefits. 

In 2021, the Museum decided to explore the economic impacts of collections data in more depth, and commissioned Frontier Economics to undertake modelling, resulting in this project report, now made publicly available in the open-science journal Research Ideas and Outcomes (RIO Journal), and confirming benefits in excess of £2 billion over 30 years. While the methods in this report are relevant to collections globally, this modelling focuses on benefits to the UK, and is intended to support the Museum’s own digitisation work, as well as a current scoping study funded by the Arts & Humanities Research Council about the case for digitising all UK natural science collections as a research infrastructure.

Sharing data from our collections can transform scientific research and help find solutions for nature and from nature. Our digitised collections have helped establish the baseline plant biodiversity in the Amazon, find wheat crops that are more resilient to climate change and support research into potential zoonotic origins of Covid-19. The research that comes from sharing our specimens has immense potential to transform our world and help both people and the planet thrive,

says Helen Hardy, Science Digital Programme Manager at the Natural History Museum.

How digitisation impacts scientific research?

The data from museum collections accelerates scientific research, which in turn creates benefits for society and the economy across a wide range of sectors. Frontier Economics Ltd have looked at the impact of collections data in five of these sectors: biodiversity conservation, invasive species, medicines discovery, agricultural research and development and mineral exploration. 

The Natural History Museum’s collection is a real treasure trove which, if made easily accessible to scientists all over the world through digitisation, has the potential to unlock ground-breaking research in any number of areas. Predicting exactly how the data will be used in future is clearly very uncertain. We have looked at the potential value that new research could create in just five areas focussing on a relatively narrow set of outcomes. We find that the value at stake is extremely large, running into billions,”

says Dan Popov, Economist at Frontier Economics Ltd.

The new analyses attempt to estimate the economic value of these benefits using a range of approaches, with the results in broad agreement that the benefits of digitisation are at least ten times greater than the costs. This represents a compelling case for investment in museum digital infrastructure without which the many benefits will not be realised.

This new analysis shows that the data locked up in our collections has significant societal and economic value, but we need investment to help us release it,

adds Professor Ken Norris, Head of the Life Sciences Department at the Natural History Museum.

Other benefits could include improvements to the resilience of agricultural crops by better understanding their wild relatives, research into invasive species which can cause significant damage to ecosystems and crops, and improving the accuracy of mining.  

Finally, there are other impacts that such work could have on how science is conducted itself. The very act of digitising specimens means that researchers anywhere on the planet can access these collections, saving time and money that may have been spent as scientists travelled to see specific objects.

The value of research enabled by digitisation of natural history collections can be estimated by looking at specific areas where the Museum’s collections contribute towards scientific research and subsequently impact the wider economy. 
© Frontier Economics Ltd.

Original source: 

Popov D, Roychoudhury P, Hardy H, Livermore L, Norris K (2021) The Value of Digitising Natural History Collections. Research Ideas and Outcomes 7: e78844. https://doi.org/10.3897/rio.7.e78844

New BiCIKL project to build a freeway between pieces of biodiversity knowledge

The new Horizon 2020-funded project BiCIKL (Biodiversity Community Integrated Knowledge Library) is a partnership of 14 European institutions, representing the continent’s and global key players in biodiversity research and data. They have committed to link their infrastructures and technologies, in order to provide flawless access to data across all stages of the research process. Three years in, BiCIKL will have created the first-of-its-kind Biodiversity Knowledge Hub.

In a recently started Horizon 2020-funded project, 14 European institutions from 10 countries, representing both the continent’s and global key players in biodiversity research and natural history, deploy and improve their own and partnering infrastructures to bridge gaps between each other’s biodiversity data types and classes. By linking their technologies, they are set to provide flawless access to data across all stages of the research cycle.

Three years in, BiCIKL (abbreviation for Biodiversity Community Integrated Knowledge Library) will have created the first-of-its-kind Biodiversity Knowledge Hub, where a researcher will be able to retrieve a full set of linked and open biodiversity data, thereby accessing the complete story behind an organism of interest: its name, genetics, occurrences, natural history, as well as authors and publications mentioning any of those.

Ultimately, the project’s products will solidify Open Science and FAIR (Findable, Accessible, Interoperable and Reusable) data practices by empowering and streamlining biodiversity research.

Together, the project partners will redesign the way biodiversity data is found, linked, integrated and re-used across the research cycle. By the end of the project, BiCIKL will provide the community with a more transparent, trustworthy and efficient highly automated research ecosystem, allowing for scientists to access, explore and put into further use a wide range of data with only a few clicks.

“In recent years, we’ve made huge progress on how biodiversity data is located, accessed, shared, extracted and preserved, thanks to a vast array of digital platforms, tools and projects looking after the different types of data, such as natural history specimens, species descriptions, images, occurrence records and genomics data, to name a few. However, we’re still missing an interconnected and user-friendly environment to pull all those pieces of knowledge together. Within BiCIKL, we all agree that it’s only after we puzzle out how to best bridge our existing infrastructures and the information they are continuously sourcing that future researchers will be able to realise their full potential,” 

explains BiCIKL’s project coordinator Prof. Lyubomir Penev, CEO and founder of Pensoft, a scholarly publisher and technology provider company.

Continuously fed with data sourced by the partnering institutions and their infrastructures, BiCIKL’s key final output: the Biodiversity Knowledge Hub, is set to persist with time long after the project has concluded. On the contrary, by accelerating biodiversity research that builds on – rather than duplicates – existing knowledge, it will in fact be providing access to exponentially growing contextualised biodiversity data.

***

Learn more about BiCIKL on the project’s website at: bicikl-project.eu

Follow BiCIKL Project on Twitter and Facebook. Join the conversation on Twitter at #BiCIKL_H2020.

***

The project partners:

One water bucket to find them all: Detecting fish, mammals, and birds from a single sample

In times of exacerbating biodiversity loss, reliable data on species occurrence are essential. Environmental DNA (eDNA) – DNA released from organisms into the water – is increasingly used to detect fishes in biodiversity monitoring campaigns. However, eDNA turns out to be capable of providing much more than fish occurrence data, including information on other vertebrates. A study, published in the open-access journal Metabarcoding and Metagenomics, demonstrates how comprehensively vertebrate diversity can be assessed at no additional costs.

Revolutionary environmental DNA analysis holds great potential for the future of biodiversity monitoring, concludes a new study

Collection of water samples for eDNA metabarcoding bioassessment.
Photo by Till-Hendrik Macher.

In times of exacerbating biodiversity loss, reliable data on species occurrence are essential, in order for prompt and adequate conservation actions to be initiated. This is especially true for freshwater ecosystems, which are particularly vulnerable and threatened by anthropogenic impacts. Their ecological status has already been highlighted as a top priority by multiple national and international directives, such as the European Water Framework Directive.

However, traditional monitoring methods, such as electrofishing, trapping methods, or observation-based assessments, which are the current status-quo in fish monitoring, are often time- and cost-consuming. As a result, over the last decade, scientists progressively agree that we need a more comprehensive and holistic method to assess freshwater biodiversity.

Meanwhile, recent studies have continuously been demonstrating that eDNA metabarcoding analyses, where DNA traces found in the water are used to identify what organisms live there, is an efficient method to capture aquatic biodiversity in a fast, reliable, non-invasive and relatively low-cost manner. In such metabarcoding studies, scientists sample, collect and sequence DNA, so that they can compare it with existing databases and identify the source organisms.

Furthermore, as eDNA metabarcoding assessments use samples from water, often streams, located at the lowest point, one such sample usually contains not only traces of specimens that come into direct contact with water, for example, by swimming or drinking, but also collects traces of terrestrial species indirectly via rainfalls, snowmelt, groundwaters etc. 

In standard fish eDNA metabarcoding assessments, these ‘bycatch data’ are typically left aside. Yet, from a viewpoint of a more holistic biodiversity monitoring, they hold immense potential to also detect the presence of terrestrial and semi-terrestrial species in the catchment.

In their new study, reported in the open-access scholarly journal Metabarcoding and MetagenomicsGerman researchers from the University of Duisburg-Essen and the German Environment Agency successfully detected an astonishing quantity of the local mammals and birds native to the Saxony-Anhalt state by collecting as much as 18 litres of water from across a two-kilometre stretch along the river Mulde.

After water filtration the eDNA filter is preserved in ethanol until further processing in the lab.
Photo by Till-Hendrik Macher.

In fact, it took only one day for the team, led by Till-Hendrik Macher, PhD student in the German Federal Environmental Agency-funded GeDNA project, to collect the samples. Using metabarcoding to analyse the DNA from the samples, the researchers identified as much as 50% of the fishes, 22% of the mammal species, and 7.4% of the breeding bird species in the region. 

However, the team also concluded that while it would normally take only 10 litres of water to assess the aquatic and semi-terrestrial fauna, terrestrial species required significantly more sampling.

Unlocking data from the increasingly available fish eDNA metabarcoding information enables synergies among terrestrial and aquatic biodiversity monitoring programs, adding further important information on species diversity in space and time. 

“We thus encourage to exploit fish eDNA metabarcoding biodiversity monitoring data to inform other conservation programs,”

says lead author Till-Hendrik Macher. 

“For that purpose, however, it is essential that eDNA data is jointly stored and accessible for different biodiversity monitoring and biodiversity assessment campaigns, either at state, federal, or international level,”

concludes Florian Leese, who coordinates the project.

Original source:

Macher T-H, Schütz R, Arle J, Beermann AJ, Koschorreck J, Leese F (2021) Beyond fish eDNA metabarcoding: Field replicates disproportionately improve the detection of stream associated vertebrate species. Metabarcoding and Metagenomics 5: e66557. https://doi.org/10.3897/mbmg.5.66557

48 years of Australian collecting trips in one data package

From 1973 to 2020, Australian zoologist Dr Robert Mesibov kept careful records of the “where” and “when” of his plant and invertebrate collecting trips. Now, he has made those valuable biodiversity data freely and easily accessible via the Zenodo open-data repository, so that future researchers can rely on this “authority file” when using museum specimens collected from those events in their own studies. The new dataset is described in the open-access, peer-reviewed Biodiversity Data Journal.

While checking museum records, Dr Robert Mesibov found there were occasional errors in the dates and places for specimens he had collected many years before. He was not surprised.

“It’s easy to make mistakes when entering data on a computer from paper specimen labels”, said Mesibov. “I also found specimen records that said I was the collector, but I know I wasn’t!”

One solution to this problem was what librarians and others have long called an “authority file”.

“It’s an authoritative reference, in this case with the correct details of where I collected and when”, he explained.

“I kept records of almost all my collecting trips from 1973 until I retired from field work in 2020. The earliest records were on paper, but I began storing the key details in digital form in the 1990s.”

The 48-year record has now been made publicly available via the Zenodo open-data repository after conversion to the Darwin Core data format, which is widely used for sharing biodiversity information. With this “authority file”, described in detail in the open-access, peer-reviewed Biodiversity Data Journal, future researchers will be able to rely on sound, interoperable and easy to access data, when using those museum specimens in their own studies, instead of repeating and further spreading unintentional errors.

“There are 3829 collecting events in the authority file”, said Mesibov, “from six Australian states and territories. For each collecting event there are geospatial and date details, plus notes on the collection.”

Mesibov hopes the authority file will be used by museums to correct errors in their catalogues.

“It should also save museums a fair bit of work in future”, he explained. “No need to transcribe details on specimen labels into digital form in a database, because the details are already in digital form in the authority file.”

Mesibov points out that in the 19th and 20th centuries, lists of collecting events were often included in the reports of major scientific expeditions.

“Those lists were authority files, but in the pre-digital days it was probably just as easy to copy collection data from specimen labels.”

“In the 21st century there’s a big push to digitise museum specimen collections”, he said. “Museum databases often have lookup tables with scientific names and the names of collectors. These lookup tables save data entry time and help to avoid errors in digitising.”

“Authority files for collecting events are the next logical step,” said Mesibov. “They can be used as lookup tables for all the important details of individual collections: where, when, by whom and how.”

###

Research paper:

Mesibov RE (2021) An Australian collector’s authority file, 1973–2020. Biodiversity Data Journal 9: e70463. https://doi.org/10.3897/BDJ.9.e70463

###

Robert Mesibov’s webpage: https://www.datafix.com.au/mesibov.html

Robert Mesibov’s ORCID page: https://orcid.org/0000-0003-3466-5038

Unlocking Australia’s biodiversity, one dataset at a time

Illustration by CSIRO

Australia’s unique and highly endemic flora and fauna are threatened by rapid losses in biodiversity and ecosystem health, caused by human influence and environmental challenges. To monitor and respond to these trends, scientists and policy-makers need reliable data.

Biodiversity researchers and managers often don’t have the necessary information, or access to it, to tackle some of the greatest environmental challenges facing society, such as biodiversity loss or climate change. Data can be a powerful tool for the development of science and decision-making, which is where the Atlas of Living Australia (ALA) comes in.

ALA – Australia’s national biodiversity database – uses cutting-edge digital tools which enable  people to share, access and analyse data about local plants, animals and fungi. It brings together millions of sightings as well as environmental data like rainfall and temperature in one place to be searched and analysed. All data are made publicly available – ALA was established in line with open-access principles and uses an open-source code base.

The impressive set of databases on Australia’s biodiversity includes information on species occurrence, animal tracking, specimens, biodiversity projects, and Australia’s Natural History Collections. The ALA also manages a wide range of other data, including information on spatial layers, indigenous ecological knowledge, taxonomic profiles and biodiversity literature. Together with its partner tools, the ALA has radically enhanced ease of access to biodiversity data. A forum paper recently published with the open-access, peer-reviewed Biodiversity Data Journal details its history, current state and future directions.

Established in 2010 under the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS) to support the research sector with trusted biodiversity data, it now delivers data and related services to more than 80,000 users every year, helping scientists, policy makers, environmental planners, industry, and the general public to work more efficiently. It also supports the international community as the Australian node of the Global Biodiversity Information Facility and the code base for the successful international Living Atlases community.

With thousands of records being added daily, the ALA currently contains nearly 95 million occurrence records of over 111,000 species, the earliest of them being from the late 1600s. Among them, 1.7 million are observation records harvested by computer algorithms, and the trend is that their share will keep growing.

An ALA staff member. Photo by CSIRO

Recognising the potential of citizen science for contributing valuable information to Australia’s biodiversity, the ALA became a member of the iNaturalist Network in 2019 and established an Australian iNaturalist node to encourage people to submit their species observations. Projects like DigiVol and BioCollect were also born from ALA’s interest in empowering citizen science.

The ALA BioCollect platform supports biodiversity-related projects by capturing both descriptive metadata and raw primary field data. BioCollect has a strong citizen science emphasis, with 524 citizen science projects that are open to involvement by anyone. The platform also provides information on projects related to ecoscience and natural resource management activities.

Hosted by the Australian Museum, DigiVol is a volunteer portal where over 6,000 public volunteers have transcribed over 800,000 specimen labels and 124,000 pages of field notes. Harnessing the power and passion of volunteers, the tool makes more information available to science by digitising specimens, images, field notes and archives from collections all over the world.

Built on a decade of partnerships with biodiversity data partners, government departments, community and citizen science organisations, the ALA provides a robust suite of services, including a range of data systems and software applications that support both the research sector and decision makers. Well regarded both domestically and internationally, it has built a national community that is working to improve the availability and accessibility of biodiversity data.

Original source:

Belbin L, Wallis E, Hobern D, Zerger A (2021) The Atlas of Living Australia: History, current state and future directions. Biodiversity Data Journal 9: e65023. https://doi.org/10.3897/BDJ.9.e65023

Call for data papers describing datasets from Russia to be published in Biodiversity Data Journal

GBIF partners with FinBIF and Pensoft to support publication of new datasets about biodiversity from across Russia

Original post via GBIF

In collaboration with the Finnish Biodiversity Information Facility (FinBIF) and Pensoft Publishers, GBIF has announced a new call for authors to submit and publish data papers on Russia in a special collection of Biodiversity Data Journal (BDJ). The call extends and expands upon a successful effort in 2020 to mobilize data from European Russia.

Between now and 15 September 2021, the article processing fee (normally €550) will be waived for the first 36 papers, provided that the publications are accepted and meet the following criteria that the data paper describes a dataset:

The manuscript must be prepared in English and is submitted in accordance with BDJ’s instructions to authors by 15 September 2021. Late submissions will not be eligible for APC waivers.

Sponsorship is limited to the first 36 accepted submissions meeting these criteria on a first-come, first-served basis. The call for submissions can therefore close prior to the stated deadline of 15 September 2021. Authors may contribute to more than one manuscript, but artificial division of the logically uniform data and data stories, or “salami publishing”, is not allowed.

BDJ will publish a special issue including the selected papers by the end of 2021. The journal is indexed by Web of Science (Impact Factor 1.331), Scopus (CiteScore: 2.1) and listed in РИНЦ / eLibrary.ru.

For non-native speakers, please ensure that your English is checked either by native speakers or by professional English-language editors prior to submission. You may credit these individuals as a “Contributor” through the AWT interface. Contributors are not listed as co-authors but can help you improve your manuscripts.

In addition to the BDJ instruction to authors, it is required that datasets referenced from the data paper a) cite the dataset’s DOI, b) appear in the paper’s list of references, and c) has “Russia 2021” in Project Data: Title and “N-Eurasia-Russia2021“ in Project Data: Identifier in the dataset’s metadata.

Authors should explore the GBIF.org section on data papers and Strategies and guidelines for scholarly publishing of biodiversity data. Manuscripts and datasets will go through a standard peer-review process. When submitting a manuscript to BDJ, authors are requested to select the Biota of Russia collection.

To see an example, view this dataset on GBIF.org and the corresponding data paper published by BDJ.

Questions may be directed either to Dmitry Schigel, GBIF scientific officer, or Yasen Mutafchiev, managing editor of Biodiversity Data Journal.

The 2021 extension of the collection of data papers will be edited by Vladimir Blagoderov, Pedro Cardoso, Ivan Chadin, Nina Filippova, Alexander Sennikov, Alexey Seregin, and Dmitry Schigel.

This project is a continuation of the successful call for data papers from European Russia in 2020. The funded papers are available in the Biota of Russia special collection and the datasets are shown on the project page.

***

Definition of terms

Datasets with more than 5,000 records that are new to GBIF.org

Datasets should contain at a minimum 5,000 new records that are new to GBIF.org. While the focus is on additional records for the region, records already published in GBIF may meet the criteria of ‘new’ if they are substantially improved, particularly through the addition of georeferenced locations.” Artificial reduction of records from otherwise uniform datasets to the necessary minimum (“salami publishing”) is discouraged and may result in rejection of the manuscript. New submissions describing updates of datasets, already presented in earlier published data papers will not be sponsored.

Justification for publishing datasets with fewer records (e.g. sampling-event datasets, sequence-based data, checklists with endemics etc.) will be considered on a case-by-case basis.

Datasets with high-quality data and metadata

Authors should start by publishing a dataset comprised of data and metadata that meets GBIF’s stated data quality requirement. This effort will involve work on an installation of the GBIF Integrated Publishing Toolkit.

Only when the dataset is prepared should authors then turn to working on the manuscript text. The extended metadata you enter in the IPT while describing your dataset can be converted into manuscript with a single-click of a button in the ARPHA Writing Tool (see also Creation and Publication of Data Papers from Ecological Metadata Language (EML) Metadata. Authors can then complete, edit and submit manuscripts to BDJ for review.

Datasets with geographic coverage in Russia

In correspondence with the funding priorities of this programme, at least 80% of the records in a dataset should have coordinates that fall within the priority area of Russia. However, authors of the paper may be affiliated with institutions anywhere in the world.

***

Check out the Biota of Russia dynamic data paper collection so far.

Follow Biodiversity Data Journal on Twitter and Facebook to keep yourself posted about the new research published.

New DNA barcoding project aims at tracking down the “dark taxa” of Germany’s insect fauna

New dynamic article collection at Biodiversity Data Journal is already accumulating the project’s findings

About 1.4 million species of animals are currently known, but it is generally accepted that this figure grossly underestimates the actual number of species in existence, which likely ranges between five and thirty million species, or even 100 million. 

Meanwhile, a far less well-known fact is that even in countries with a long history of taxonomic research, such as Germany, which is currently known to be inhabited by about 48,000 animal species, there are thousands of insect species still awaiting discovery. In particular, the orders Diptera (flies) and Hymenoptera (especially the parasitoid wasps) are insect groups suspected to contain a strikingly large number of undescribed species. With almost 10,000 known species each, these two insect orders account for approximately two-thirds of Germany’s insect fauna, underlining the importance of these insects in many ways.

The conclusion that there are not only a few, but so many unknown species in Germany is a result of the earlier German Barcode of Life projects: GBOL I and GBOL II, both supported by the German Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung, BMBF) and the Bavarian Ministry of Science under the project Barcoding Fauna Bavarica. 

In its previous phases, GBOL aimed to identify all German species reliably, quickly and inexpensively using DNA barcodes. Since the first project was launched twelve years ago, more than 25,000 German animal species have been barcoded. Among them, the comparatively well-known groups, such as butterflies, moths, beetles, grasshoppers, spiders, bees and wasps, showed an almost complete coverage of the species inventory.

In 2020, another BMBF-funded DNA barcoding project, titled GBOL III: Dark Taxa, was launched, in order to focus on the lesser-known groups of Diptera and parasitoid Hymenoptera, which are often referred to as “dark taxa”. The new project commenced at three major German natural history institutions: the Zoological Research Museum Alexander Koenig (Bonn), the Bavarian State Collection of Zoology (SNSB, Munich) and the State Museum of Natural History Stuttgart, in collaboration with the University of Würzburg and the Entomological Society Krefeld. Together, the project partners are to join efforts and skills to address a range of questions related to the taxonomy of the “dark taxa” in Germany.

As part of the initiative, the project partners are invited to submit their results and outcomes in the dedicated GBOL III: Dark Taxa article collection in the peer-reviewed, open-access Biodiversity Data Journal. There, the contributions will be published dynamically, as soon as approved and ready for publication. The articles will include taxonomic revisions, checklists, data papers, contributions to methods and protocols, employed in DNA barcoding studies with a focus on the target taxa of the project.

“The collection of articles published in the Biodiversity Data Journal is an excellent approach to achieving the consortium’s goals and project partners are encouraged to take advantage of the journal’s streamlined publication workflows to publish and disseminate data and results that were generated during the project,”

says the collection’s editor Dr Stefan Schmidt of the Bavarian State Collection of Zoology.

***

Find and follow the dynamic article collection GBOL III: Dark Taxa in Biodiversity Data Journal.

Follow Biodiversity Data Journal on Twitter and Facebook.

Pensoft Annotator – a tool for text annotation with ontologies

By Mariya Dimitrova, Georgi Zhelezov, Teodor Georgiev and Lyubomir Penev

The use of written language to record new knowledge is one of the advancements of civilisation that has helped us achieve progress. However, in the era of Big Data, the amount of published writing greatly exceeds the physical ability of humans to read and understand all written information. 

More than ever, we need computers to help us process and manage written knowledge. Unlike humans, computers are “naturally fluent” in many languages, such as the formats of the Semantic Web. These standards were developed by the World Wide Web Consortium (W3C) to enable computers to understand data published on the Internet. As a result, computers can index web content and gather data and metadata about web resources.

To help manage knowledge in different domains, humans have started to develop ontologies: shared conceptualisations of real-world objects, phenomena and abstract concepts, expressed in machine-readable formats. Such ontologies can provide computers with the necessary basic knowledge, or axioms, to help them understand the definitions and relations between resources on the Web. Ontologies outline data concepts, each with its own unique identifier, definition and human-legible label.

Matching data to its underlying ontological model is called ontology population and involves data handling and parsing that gives it additional context and semantics (meaning). Over the past couple of years, Pensoft has been working on an ontology population tool, the Pensoft Annotator, which matches free text to ontological terms.

The Pensoft Annotator is a web application, which allows annotation of text input by the user, with any of the available ontologies. Currently, they are the Environment Ontology (ENVO) and the Relation Ontology (RO), but we plan to upload many more. The Annotator can be run with multiple ontologies, and will return a table of matched ontological term identifiers, their labels, as well as the ontology from which they originate (Fig. 1). The results can also be downloaded as a Tab-Separated Value (TSV) file and certain records can be removed from the table of results, if desired. In addition, the Pensoft Annotator allows to exclude certain words (“stopwords”) from the free text matching algorithm. There is a list of default stopwords, common for the English language, such as prepositions and pronouns, but anyone can add new stopwords.

Figure 1. Interface of the Pensoft Annotator application

In Figure 1, we have annotated a sentence with the Pensoft Annotator, which yields a single matched term, labeled ‘host of’, from the Relation Ontology (RO). The ontology term identifier is linked to a webpage in Ontobee, which points to additional metadata about the ontology term (Fig. 2).

Figure 2. Web page about ontology term

Such annotation requests can be run to perform text analyses for topic modelling to discover texts which contain host-pathogen interactions. Topic modelling is used to build algorithms for content recommendation (recommender systems) which can be implemented in online news platforms, streaming services, shopping websites and others.

At Pensoft, we use the Pensoft Annotator to enrich biodiversity publications with semantics. We are currently annotating taxonomic treatments with a custom-made ontology based on the Relation Ontology (RO) to discover treatments potentially describing species interactions. You can read more about using the Annotator to detect biotic interactions in this abstract.

The Pensoft Annotator can also be used programmatically through an API, allowing you to integrate the Annotator into your own script. For more information about using the Pensoft Annotator, please check out the documentation.