Tag: GBIF

Bulgaria joins the Global Biodiversity Information Facility (GBIF)

Led by Pensoft and its CEO Prof. Lyubomir Penev, the partnership marks a major step for Bulgarian science and regional biodiversity leadership.

Bulgaria officially joins the Global Biodiversity Information Facility (GBIF). This major event for Bulgarian science was initiated by a memorandum signed by the Minister of Environment and Water: Manol Genov.

GBIF is an international network and data infrastructure funded by governments around the world that provides international open access to a modern and comprehensive database of all species of living organisms on the planet.

Joining GBIF is an important step for initiatives such as the Bulgarian Barcode of Life (BgBOL), as it will facilitate the integration of genetic data on species diversity into the global scientific community and support the creation of a more accurate and accessible bioinformatic database. This will increase the scientific visibility and relevance of Bulgarian efforts in molecular taxonomy and conservation.

Newly established Bulgarian Barcode of Life to support biodiversity conservation in the country

World map showing GBIF network participants: green for voting participants, blue for associate participants, gray for non-participants. — Prof. Lyubomir Penev

“First of all, I’d like to congratulate all fellow scientists working in the domain of biology and ecology in Bulgaria with this wonderful achievement,” says Prof. Dr. Lyubomir Penev, founder and CEO of the scientific publisher and technology provider Pensoft, as well as a key participant in the talks and preparations for Bulgaria’s joining GBIF. He is also Chair of BgBOL.

“Becoming a full member of GBIF has been a long-anticipated milestone we have discussed and worked on for several years. Coming not long after we initiated the Bulgarian Barcode of Life, the Bulgarian membership in GBIF gives us yet another uncontested evidence that the nation is on the right path to preserving our uniquely rich fauna and flora,” he adds.

Pensoft is looking forward to sharing our know-how with Bulgarian institutions and scientists in order to streamline the visibility and overall efficiency of biodiversity data collected from Bulgaria.
Prof. Lyubomir Penev

“As close partners of GBIF for over 15 years now, Pensoft is looking forward to sharing our know-how with Bulgarian institutions and scientists, so that they can fully utilise the GBIF infrastructure and tools, in order to streamline the visibility and overall efficiency of biodiversity data collected from Bulgaria.”

GBIF is managed by a Secretariat based in Copenhagen and brings together countries and organisations that collaborate through national and institutional coordinators (also called participant nodes). The mechanism provides common standards, good practices and open access tools for institutions around the world to share information on the location and recording of species and specimens. According to GBIF, a total of 107 countries and organisations currently participate in the network, a significant number of which are European.

The GBIF network, as screenshot from https://www.gbif.org/the-gbif-network on 10/06/2025.

By joining GBIF, biodiversity data generated in Bulgaria can be streamlined through the network’s infrastructure so that the country does not need to build and maintain its own separate infrastructure, which also saves significant financial resources.

As a full voting member, Bulgaria will ensure that biodiversity data in the country will be shared and accessible through the platform, and will contribute to global knowledge on biodiversity, respectively to the solutions that will promote its conservation and sustainable use.

Map of Bulgaria showing biodiversity data with orange heatmap indicating occurrences. — Bulgaria’s page on GBIF, as screenshot from https://www.gbif.org/country/BG/summary on 10/06/2025.

Improvements in data management by Bulgaria will also contribute to better reporting and fulfilment of obligations to the Convention on Biological Diversity (CBD) as well as to the Intergovernmental Platform on Biodiversity and Ecosystem Services (IPBES). As a member of GBIF, Bulgaria will be able to apply for funding for flagship activities in Bulgarian institutions and neighbouring Balkan countries. This will enable the country to expand its leadership role in the Balkans in biodiversity research and data accumulation.

GBIF and Pensoft signed a Memorandum of Cooperation

The partnership between GBIF and Pensoft dates back to 2009 when the global network and the publisher signed their first Memorandum of Understanding intended to solidify their cooperation as leaders in the technological advancement relevant to biodiversity knowledge. Over the next few years, Pensoft integrated its whole biodiversity journal portfolio with the GBIF infrastructure to enable multiple automated workflows, including export of all species occurrence data published in scientific articles straight to the GBIF platform. Most recently, over 20 biodiversity journals powered by Pensoft’s scholarly publishing platform ARPHA launched their own hosted portals on GBIF to make it easier to access and use biodiversity data associated with published research, aligning with principles of Findable, Accessible, Interoperable, and Reusable (FAIR) data.

More than 20 journals published by Pensoft with their own hosted data portals on GBIF to streamline and FAIR-ify biodiversity research

The portals currently host data on over 1,000 datasets and almost 325,000 occurrence records across the 25 journals.

In collaboration with the Global Biodiversity Information Facility (GBIF), Pensoft has established hosted data portals for 25 open-access peer-reviewed journals published on the ARPHA Platform.

A screenshot featuring a close-up of a turtle on a forest floor, overlayed with a web portal design for biodiversity data browsing. — A screenshot of the Check List data portal.

The initiative aims to make it easier to access and use biodiversity data associated with published research, aligning with principles of Findable, Accessible, Interoperable, and Reusable (FAIR) data.

The data portals offer seamless integration of published articles and associated data elements with GBIF-mediated records. Now, researchers, educators, and conservation practitioners can discover and use the extensive species occurrence and other data associated with the papers published in each journal.

A video displaying an interactive map with occurrence data on the BDJ portal.

The collaboration between Pensoft and GBIF was recently piloted with the Biodiversity Data Journal (BDJ). Today, the BDJ hosted portal provides seamless access and exploration for nearly 300,000 occurrences of biological organisms from all over the world that have been extracted from the journal’s all-time publications. In addition, the portal provides direct access to more than 800 datasets published alongside papers in BDJ, as well as to almost 1,000 citations of the journal articles associated with those publications.

The Biodiversity Data Journal launches its own data portal on GBIF

“The release of the BDJ portal and subsequent ones planned for other Pensoft journals should inspire other publishers to follow suit in advancing a more interconnected, open and accessible ecosystem for biodiversity research,” said Dr. Vince Smith, Editor-in-Chief of BDJ and head of digital, data and informatics at the Natural History Museum, London.

Joining the @ejtaxonomy, the @BioDataJournal is the latest #ScientificJournal to launch a GBIF hosted portal! 🐟

This @Pensoft-published journal is the first of many under the masthead expected to participate in the GBIF programme. ⚡

Read more: 🔗https://t.co/IA3IWydRLy pic.twitter.com/pbulurX9Kn
— GBIF @biodiversity.social/@gbif (@GBIF) March 10, 2025

“The programme will provide a scalable solution for more than thirty of the journals we publish thanks to our partnership with Plazi, and will foster greater connectivity between scientific research and the evidence that supports it,” said Prof. Lyubomir Penev, founder and chief executive officer of Pensoft.

On the new portals, users can search data, refining their queries based on various criteria such as taxonomic classification, and conservation status. They also have access to statistical information about the hosted data.

Together, the hosted portals provide data on almost 325,000 occurrence records, as well as over 1,000 datasets published across the journals.

Call for data papers describing datasets from Northern Eurasia in Biodiversity Data Journal

In collaboration with the Finnish Biodiversity Information Facility (FinBIF) and Pensoft Publishers, GBIF has announced a new call for authors to submit and publish data papers on Russia in a special collection of Biodiversity Data Journal (BDJ). The call extends and expands upon a successful effort in 2020 to mobilize data from European Russia.

GBIF partners with FinBIF and Pensoft’s Biodiversity Data Journal to streamline publication of new datasets about biodiversity from Northern Eurasia

Original post via GBIF

In collaboration with the Finnish Biodiversity Information Facility (FinBIF) and Pensoft Publishers, GBIF has announced a new call for authors to submit and publish data papers on Northern Eurasia in a special collection of Biodiversity Data Journal (BDJ). The call expands upon successful efforts to mobilize data from European Russia in 2020 and from the rest of Russia in 2021.

Until 30 June 2022, Pensoft will waive the article processing fee (normally €650) for the first 50 accepted data paper manuscripts that meet the following criteria for describing a dataset:

See the complete definition of these terms below.

Detailed instructions

Authors must prepare the manuscript in English and submit it in accordance with BDJ’s instructions to authors by 30 June 2022. Late submissions will not be eligible for APC waivers.

Sponsorship is limited to the first 50 accepted submissions meeting these criteria on a first-come, first-served basis. The call for submissions can therefore close prior to the deadline of 30 June 2022. Authors may contribute to more than one manuscript, but artificial division of the logically uniform data and data stories, or “salami publishing”, is not allowed.

BDJ will publish a special issue including the selected papers by the end of 2021. The journal is indexed by Web of Science (Impact Factor 1.225), Scopus (CiteScore: 2.0) and listed in РИНЦ / eLibrary.ru.

For non-native speakers, please ensure that your English is checked either by native speakers or by professional English-language editors prior to submission. You may credit these individuals as a “Contributor” through the AWT interface. Contributors are not listed as co-authors but can help you improve your manuscripts. BDJ will introduce stricter language checks for the 2022 call; poorly written submissions will be rejected prior to the peer-review process.

In addition to the BDJ instruction to authors, data papers must referenced the dataset by
a) citing the dataset’s DOI
b) appearing in the paper’s list of references
c) including “Northern Eurasia 2022” in the Project Data: Title and “N-Eurasia-2022“ in Project Data: Identifier in the dataset’s metadata.

Authors should explore the GBIF.org section on data papers and Strategies and guidelines for scholarly publishing of biodiversity data. Manuscripts and datasets will go through a standard peer-review process. When submitting a manuscript to BDJ, authors are requested to assign their manuscript to the Topical Collection: Biota of Northern Eurasia at step 3 of the submission process. To initiate the manuscript submission, remember to press the Submit to the journal button.

To see an example, view this dataset on GBIF.org and the corresponding data paper published by BDJ.

Questions may be directed either to Dmitry Schigel, GBIF scientific officer, or Yasen Mutafchiev, managing editor of Biodiversity Data Journal.

This project is a continuation of successful calls for data papers from European Russia in 2020 and 2021. The funded papers are available in the Biota of Russia special collection and the datasets are shown on the project page.

Definition of terms

Datasets with more than 7,000 presence records new to GBIF.org

Datasets should contain at a minimum 7,000 presence records new to GBIF.org. While the focus is on additional records for the region, records already published in GBIF may meet the criteria of ‘new’ if they are substantially improved, particularly through the addition of georeferenced locations.” Artificial reduction of records from otherwise uniform datasets to the necessary minimum (“salami publishing”) is discouraged and may result in rejection of the manuscript. New submissions describing updates of datasets, already presented in earlier published data papers will not be sponsored.

Justification for publishing datasets with fewer records (e.g. sampling-event datasets, sequence-based data, checklists with endemics etc.) will be considered on a case-by-case basis.

Datasets with high-quality data and metadata

Authors should start by publishing a dataset comprised of data and metadata that meets GBIF’s stated data quality requirement. This effort will involve work on an installation of the GBIF Integrated Publishing Toolkit. BDJ will conduct its standard data audit and technical review. All datasets must pass the data audit prior to a manuscript being forwarded for peer review.

Only when the dataset is prepared should authors then turn to working on the manuscript text. The extended metadata you enter in the IPT while describing your dataset can be converted into manuscript with a single-click of a button in the ARPHA Writing Tool (see also Creation and Publication of Data Papers from Ecological Metadata Language (EML) Metadata. Authors can then complete, edit and submit manuscripts to BDJ for review.

Datasets with geographic coverage in Northern Eurasia

In correspondence with the funding priorities of this programme, at least 80% of the records in a dataset should have coordinates that fall within the priority areas of Russia, Ukraine, Belarus, Kazakhstan, Kyrgyzstan, Uzbekistan, Tajikistan, Turkmenistan, Moldova, Georgia, Armenia and Azerbaijan. However, authors of the paper may be affiliated with institutions anywhere in the world.

***

Follow Biodiversity Data Journal on Twitter and Facebook to keep yourself posted about the new research published.

Call for data papers describing datasets from Russia to be published in Biodiversity Data Journal

GBIF partners with FinBIF and Pensoft to support publication of new datasets about biodiversity from across Russia

Original post via GBIF

In collaboration with the Finnish Biodiversity Information Facility (FinBIF) and Pensoft Publishers, GBIF has announced a new call for authors to submit and publish data papers on Russia in a special collection of Biodiversity Data Journal (BDJ). The call extends and expands upon a successful effort in 2020 to mobilize data from European Russia.

Between now and 15 September 2021, the article processing fee (normally €550) will be waived for the first 36 papers, provided that the publications are accepted and meet the following criteria that the data paper describes a dataset:

The manuscript must be prepared in English and is submitted in accordance with BDJ’s instructions to authors by 15 September 2021. Late submissions will not be eligible for APC waivers.

Sponsorship is limited to the first 36 accepted submissions meeting these criteria on a first-come, first-served basis. The call for submissions can therefore close prior to the stated deadline of 15 September 2021. Authors may contribute to more than one manuscript, but artificial division of the logically uniform data and data stories, or “salami publishing”, is not allowed.

BDJ will publish a special issue including the selected papers by the end of 2021. The journal is indexed by Web of Science (Impact Factor 1.331), Scopus (CiteScore: 2.1) and listed in РИНЦ / eLibrary.ru.

In addition to the BDJ instruction to authors, it is required that datasets referenced from the data paper a) cite the dataset’s DOI, b) appear in the paper’s list of references, and c) has “Russia 2021” in Project Data: Title and “N-Eurasia-Russia2021“ in Project Data: Identifier in the dataset’s metadata.

To see an example, view this dataset on GBIF.org and the corresponding data paper published by BDJ.

Questions may be directed either to Dmitry Schigel, GBIF scientific officer, or Yasen Mutafchiev, managing editor of Biodiversity Data Journal.

The 2021 extension of the collection of data papers will be edited by Vladimir Blagoderov, Pedro Cardoso, Ivan Chadin, Nina Filippova, Alexander Sennikov, Alexey Seregin, and Dmitry Schigel.

This project is a continuation of the successful call for data papers from European Russia in 2020. The funded papers are available in the Biota of Russia special collection and the datasets are shown on the project page.

***

Definition of terms

Datasets with more than 5,000 records that are new to GBIF.org

Datasets should contain at a minimum 5,000 new records that are new to GBIF.org. While the focus is on additional records for the region, records already published in GBIF may meet the criteria of ‘new’ if they are substantially improved, particularly through the addition of georeferenced locations.” Artificial reduction of records from otherwise uniform datasets to the necessary minimum (“salami publishing”) is discouraged and may result in rejection of the manuscript. New submissions describing updates of datasets, already presented in earlier published data papers will not be sponsored.

Justification for publishing datasets with fewer records (e.g. sampling-event datasets, sequence-based data, checklists with endemics etc.) will be considered on a case-by-case basis.

Datasets with high-quality data and metadata

Datasets with geographic coverage in Russia

In correspondence with the funding priorities of this programme, at least 80% of the records in a dataset should have coordinates that fall within the priority area of Russia. However, authors of the paper may be affiliated with institutions anywhere in the world.

***

Check out the Biota of Russia dynamic data paper collection so far.

Follow Biodiversity Data Journal on Twitter and Facebook to keep yourself posted about the new research published.

Call for data papers from European Russia

Partners GBIF, FinBIF and Pensoft to support publication of data papers that describe datasets from Russia west of the Ural Mountains

Original post via GBIF

GBIF—the Global Biodiversity Information Facility—in collaboration with the Finnish Biodiversity Information Facility (FinBIF) and Pensoft Publishers, are happy to issue a call for authors to submit and publish data papers on European Russia (west of the Urals) in an upcoming special issue of Biodiversity Data Journal (BDJ).

Between now and 31 August 2020, the article processing fee (normally €450) will be waived for the first 20 papers, provided that the publications are accepted and meet the following criteria that the data paper describes a dataset:

The manuscript must be prepared in English and is submitted in accordance with BDJ’s instructions to authors by 31 August 2020. Late submissions will not be eligible for APC waivers.

Sponsorship is limited to the first 20 accepted submissions meeting these criteria on a first-come, first-served basis. The call for submissions can therefore close prior to the stated deadline of 31 August. Authors may contribute to more than one manuscript, but artificial division of the logically uniform data and data stories, or “salami publishing”, is not allowed.

BDJ will publish a special issue including the selected papers by the end of 2020. The journal is indexed by Web of Science (Impact Factor 1.029), Scopus (CiteScore: 1.24) and listed in РИНЦ / eLibrary.ru

In addition to the BDJ instruction to authors, it is required that datasets referenced from the data paper a) cite the dataset’s DOI and b) appear in the paper’s list of references.

To see an example, view this dataset on GBIF.org and the corresponding data paper published by BDJ.

Questions may be directed either to Dmitry Schigel, GBIF scientific officer, or Yasen Mutafchiev, managing editor of Biodiversity Data Journal.

Definition of terms

Datasets with more than 5,000 records that are new to GBIF.org

Justification for publishing datasets with fewer records (e.g. sampling-event datasets, sequence-based data, checklists with endemics etc.) will be considered on a case-by-case basis.

Datasets with high-quality data and metadata

Datasets with geographic coverage in European Russia west of the Ural mountains

In correspondence with the funding priorities of this programme, at least 80% of the records in a dataset should have coordinates that fall within the priority area of European Russia west of the Ural mountains. However, authors of the paper may be affiliated with institutions anywhere in the world.

#####

Data audit at Pensoft’s biodiversity journals

Data papers submitted to Biodiversity Data Journal, as well as all relevant biodiversity-themed journals in Pensoft’s portfolio, undergo a mandatory data auditing workflow before being passed down to a subject editor.

Learn more about the workflow here:
https://www.eurekalert.org/pub_releases/2019-10/pp-aif101819.php.

Check out the case study below to see how the data audit workflow works in practice.

CASE STUDY: Data audit for the “Vascular plants dataset of the COFC herbarium (University of Cordoba, Spain)”, a data paper in PhytoKeys

FAIR biodiversity data in Pensoft journals thanks to a routine data auditing workflow

*Data audit workflow provided for data papers submitted to Pensoft journals.*

To avoid publication of openly accessible, yet unusable datasets, fated to result in irreproducible and inoperable biological diversity research at some point down the road, Pensoft takes care for auditing data described in data paper manuscripts upon their submission to applicable journals in the publisher’s portfolio, including Biodiversity Data Journal, ZooKeys, PhytoKeys, MycoKeys and many others.

Once the dataset is clean and the paper is published, biodiversity data, such as taxa, occurrence records, observations, specimens and related information, become FAIR (findable, accessible, interoperable and reusable), so that they can be merged, reformatted and incorporated into novel and visionary projects, regardless of whether they are accessed by a human researcher or a data-mining computation.

As part of the pre-review technical evaluation of a data paper submitted to a Pensoft journal, the associated datasets are subjected to data audit meant to identify any issues that could make the data inoperable. This check is conducted regardless of whether the dataset are provided as supplementary material within the data paper manuscript or linked from the Global Biodiversity Information Facility (GBIF) or another external repository. The features that undergo the audit can be found in a data quality checklist made available from the website of each journal alongside key recommendations for submitting authors.

Once the check is complete, the submitting author receives an audit report providing improvement recommendations, similarly to the commentaries he/she would receive following the peer review stage of the data paper. In case there are major issues with the dataset, the data paper can be rejected prior to assignment to a subject editor, but resubmitted after the necessary corrections are applied. At this step, authors who have already published their data via an external repository are also reminded to correct those accordingly.

“It all started back in 2010, when we joined forces with GBIF on a quite advanced idea in the domain of biodiversity: a data paper workflow as a means to recognise both the scientific value of rich metadata and the efforts of the the data collectors and curators. Together we figured that those data could be published most efficiently as citable academic papers,” says Pensoft’s founder and Managing director Prof. Lyubomir Penev.

“From there, with the kind help and support of Dr Robert Mesibov, the concept evolved into a data audit workflow, meant to ‘proofread’ the data in those data papers the way a copy editor would go through the text,” he adds.

“The data auditing we do is not a check on whether a scientific name is properly spelled, or a bibliographic reference is correct, or a locality has the correct latitude and longitude”, explains Dr Mesibov. “Instead, we aim to ensure that there are no broken or duplicated records, disagreements between fields, misuses of the Darwin Core recommendations, or any of the many technical issues, such as character encoding errors, that can be an obstacle to data processing.”

At Pensoft, the publication of openly accessible, easy to access, find, re-use and archive data is seen as a crucial responsibility of researchers aiming to deliver high-quality and viable scientific output intended to stand the test of time and serve the public good.

CASE STUDY: Data audit for the “Vascular plants dataset of the COFC herbarium (University of Cordoba, Spain)”, a data paper in PhytoKeys

To explain how and why biodiversity data should be published in full compliance with the best (open) science practices, the team behind Pensoft and long-year collaborators published a guidelines paper, titled “Strategies and guidelines for scholarly publishing of biodiversity data” in the open science journal Research Ideas and Outcomes (RIO Journal).

New light on the controversial question of species abundance and population density

Inspired by the negative results in the recently published largest-scale analysis of the relation between population density and positions in geographic ranges and environmental niches, Drs Jorge Soberon and Andrew Townsend Peterson of the University of Kansas, USA, teamed up with Luis Osorio-Olvera, National University of Mexico (UNAM), and identified several issues in the methodology used, able to turn the tables in the ongoing debate. Their findings are published in the innovative open access journal Rethinking Ecology.

Both empirical work and theoretical arguments published and cited over the last several years suggest that if someone was to take the distributional range of a species – be it animal or plant – and draw lines starting at the edges of the space inwards, they would find the species’ populations densest at the intersection of those lines. However, when the team of Tad Dallas, University of Helsinki, Finland, analysed a large dataset of 118,000 populations, equating to over 1,400 species of birds, mammals, and trees, they found no such relationship.

Having analysed the analysis, the American-Mexican team concluded that despite being based on an unprecedented volume of data, the earlier study was missing out some important points.

Firstly, the largest dataset used by Tad and his team comprises observational data which had not required a certain sampling protocol or a plan. Without any standard in use, it is easy to imagine that the observations would be predominantly coming from people around and near cities, hence strongly biased.

Additionally, the scientists note that the analysis largely disregards parts of species’ geographic distributions for which there were no abundant data. As a result, the range of a species could be narrowed down significantly and its centroid – misplaced. Meanwhile, the population would appear denser on what appears to be the periphery of the area.

Similar issue is identified in the localisation of populations in the environmental space, where once again their range turned out to have been represented as significantly smaller, when compared to data available from the International Union for Conservation of Nature (IUCN) and the Global Biodiversity Information Facility (GBIF).

Further, a closer look into the supplementary materials provided revealed that the precision of the population-density data was not scalable with the climate data. As a result, it is likely that multiple abundance data falls within a single climate pixel.

In conclusion, the authors note that in order to comprehensively study the abundance of a species’ populations, one needs to take into consideration a number of factors lying beyond the scope of either of the papers, including human impact.

“We suggest that this important question remains far from settled,” they say.

###

Original source:

Soberón J, Peterson TA, Osorio-Olvera L (2018) A comment on “Species are not most abundant in the centre of their geographic range or climatic niche”. Rethinking Ecology 3: 13-18. https://doi.org/10.3897/rethinkingecology.3.24827

Integration of Freshwater Biodiversity Information for Decision-Making in Rwanda

Teams from Ghana, Malawi, Namibia and Rwanda during the inception meeting of the African Biodiversity Challenge Project in Kigali, Rwanda. Photo by Yvette Umurungi.

The establishment and implementation of a long-term strategy for freshwater biodiversity data mobilisation, sharing, processing and reporting in Rwanda is to support environment monitoring and the implementation of Rwanda’s National Biodiversity Strategy (NBSAP). In addition, it is to also help us understand how economic transformation and environmental change is affecting freshwater biodiversity and its resulting ecosystem services.

As part of this strategy, the Center of Excellence in Biodiversity and Natural Resource Management (CoEB) at the University of Rwanda, jointly with the Rwanda Environment Management Authority (REMA) and the Albertine Rift Conservation Society (ARCOS), are implementing the African Biodiversity Challenge (ABC) project “Integration of Freshwater Biodiversity Information for Decision-Making in Rwanda.”

The conference abstract for this project has been published in the open access journal Biodiversity Information Science and Standards (BISS).

The CoEB has a national mandate to lead on biodiversity data mobilisation and implementation of the NBSAP in collaboration with REMA. This includes digitising data from reports, conducting analyses and reporting for policy and research, as indicated in Rwanda’s NBSAP.

The collation of the data will follow the international standards and will be available online, so that they can be accessed and reused from around the world. In fact, CoEB aspires to become a Global Biodiversity Informatics Facility (GBIF) node, thereby strengthening its capacity for biodiversity data mobilisation.

Data use training for the African Biodiversity Challenges at the South African National Biodiversity Institute (SANBI), South Africa. Photo by Yvette Umurungi.

The mobilised data will be organised using GBIF standards, and the project will leverage the tools developed by GBIF to facilitate data publication. Additionally, it will also provide an opportunity for ARCOS to strengthen its collaboration with CoEB as part of its endeavor to establish a regional network for biodiversity data management in the Albertine Rift Region.

The project is expected to conclude with at least six datasets, which will be published through the ARCOS Biodiversity Information System. These are to include three datasets for the Kagera River Basin; one on freshwater macro-invertebrates from the Congo and Nile Basins; one for the Rwanda Development Board archive of research reports from protected areas; and one from thesis reports from master’s and bachelor’s students at the University of Rwanda.

The project will also produce and release the first “Rwandan State of Freshwater Biodiversity”, a document which will describe the status of biodiversity in freshwater ecosystems in Rwanda and present socio-economic conditions affecting human interactions with this biodiversity.

The page of Center of Excellence in Biodiversity and Natural Resource Management (CoEB) at University of Rwanda on the Global Biodiversity Information Facility portal. Image by Yvette Umurungi.

***

The ABC project is a competition coordinated by the South African National Biodiversity Institute (SANBI) and funded by the JRS Biodiversity Foundation. The competition is part of the JRS-funded project, “Mobilising Policy and Decision-making Relevant Biodiversity Data,” and supports the Biodiversity Information Management activities of the GBIF Africa network.

Original source:

Umurungi Y, Kanyamibwa S, Gashakamba F, Kaplin B (2018) African Biodiversity Challenge: Integrating Freshwater Biodiversity Information to Guide Informed Decision-Making in Rwanda. Biodiversity Information Science and Standards 2: e26367. https://doi.org/10.3897/biss.2.26367

Audit finds biodiversity data aggregators ‘lose and confuse’ data

In an effort to improve the quality of biodiversity records, the Atlas of Living Australia (ALA) and the Global Biodiversity Information Facility (GBIF) use automated data processing to check individual data items. The records are provided to the ALA and GBIF by museums, herbaria and other biodiversity data sources.

However, an independent analysis of such records reports that ALA and GBIF data processing also leads to data loss and unjustified changes in scientific names.

The study was carried out by Dr Robert Mesibov, an Australian millipede specialist who also works as a data auditor. Dr Mesibov checked around 800,000 records retrieved from the Australian Museum, Museums Victoria and the New Zealand Arthropod Collection. His results are published in the open access journal ZooKeys, and also archived in a public data repository.

“I was mainly interested in changes made by the aggregators to the genus and species names in the records,” said Dr Mesibov.

“I found that names in up to 1 in 5 records were changed, often because the aggregator couldn’t find the name in the look-up table it used.”

Another worrying result concerned type specimens – the reference specimens upon which scientific names are based. On a number of occasions, the aggregators were found to have replaced the name of a type specimen with a name tied to an entirely different type specimen.

The biggest surprise, according to Dr Mesibov, was the major disagreement on names between aggregators.

“There was very little agreement,” he explained. “One aggregator would change a name and the other wouldn’t, or would change it in a different way.”

Furthermore, dates, names and locality information were sometimes lost from records, mainly due to programming errors in the software used by aggregators to check data items. In some data fields the loss reached 100%, with no original data items surviving the processing.

“The lesson from this audit is that biodiversity data aggregation isn’t harmless,” said Dr Mesibov. “It can lose and confuse perfectly good data.”

“Users of aggregated data should always download both original and processed data items, and should check for data loss or modification, and for replacement of names,” he concluded.

###

Original source:

Mesibov R (2018) An audit of some filtering effects in aggregated occurrence records. ZooKeys 751: 129-146. https://doi.org/10.3897/zookeys.751.24791

How to import data papers from GBIF, DataONE and LTER metadata

On October 13, 2015, we published a blog post about the novel functionalities in ARPHA that allow streamlined import of data papers from EML.

Now, this process has been described in the Tips and Tricks section of the ARPHA Authoring Tool. Here, we’ll list the individual workflows:

We want to stress at this point that the import functionality itself is agnostic of the data source and any metadata file in EML 2.1.1 or 2.1.0 can be imported. We have listed these three most likely sources of metadata to illustrate the workflow.

In the remainder of the post, we will go through the original post from October 13, 2015 and highlight the latest updates.

At the time of the writing of the original post, the Biodiversity Information Standards conference, TDWG 2015, was taking place in Kenya. Data sharing, data re-use, and data discovery were being brought up in almost every talk. We might have entered the age of Big Data twenty years ago, but it is now that scientists face the real challenge – storing and searching through the deluge of data to find what they need.

As the rate at which we exponentially generate data exceeds the rate at which data storage technologies improve, the field of data management seems to be greatly challenged. Worse, this means the more new data is generated, the more of the older ones will be lost. In order to know what to keep and what to delete, we need to describe the data as much as possible, and judge the importance of datasets. This post is about a novel way to automatically generate scientific papers describing a dataset, which will be referred to as data papers.

The common characters of the records, i.e. descriptions of the object of study, the measurement apparatus and the statistical summaries used to quantify the records, the personal notes of the researcher, and so on, are called metadata. Major web portals such as DataONE, the Global Biodiversity Information Facility (GBIF), or the Long Term Ecological Research Network store metadata in conjunction with a given dataset as one or more text files, usually structured in special formats enabling the parsing of the metadata by algorithms.

To make the metadata and the corresponding datasets discoverable and citable, the concept of the data paper was introduced in the early 2000’s by the Ecological Society of America. This concept was brought to the attention of the biodiversity community by Chavan and Penev (2011) with the introduction of a new data paper concept, based on a metadata standard, such as the Ecological Metadata Language, and derived from metadata content stored at large data platforms, in this case the Global Biodiversity Information Facility (GBIF). You can read this article for an in-depth discussion of the topic.

Pensoft’s Biodiversity Data Journal (BDJ) is to the best of our knowledge the first academic journal to have implemented a one-hundred-percent online authoring system for data papers, called ARPHA. Moreover, BDJ and the other Pensoft journals, such as ZooKeys, have already published more than seventy data papers.

Therefore, in the remainder of this post we will explain how to use an automated approach to publish a data paper describing an online dataset in Biodiversity Data Journal. The ARPHA system will convert the metadata describing your dataset into a manuscript for you after reading in the metadata. We will illustrate the workflow on the previously mentioned DataONE and GBIF.

The Data Observation Network for Earth (DataONE) is a distributed cyberinfrastructure funded by the U.S. National Science Foundation. It links together over twenty five nodes, primarily in the U.S., hosting biodiversity and biodiversity-related data, and provides an interface to search for data in all of them (Note: In the meantime, DataONE has updated their search interface).

Since butterflies are neat, let’s search for datasets about butterflies on DataONE! Type “Lepidoptera” in the search field and scroll down to the dataset describing “The Effects of Edge Proximity on Butterfly Biodiversity.” You should see something like this:

As you can notice, this resource has two objects associated with it: metadata, which has been highlighted, and the dataset itself. Let’s download the metadata from the cloud! The resulting text file, “Blandy.235.1.xml”, or whatever you want to call it, can be read by humans, but is somewhat cryptic because of all the XML tags. Now, you can import this file to the ARPHA writing platform and the information stored in it would be used to create a data paper! Go to the ARPHA web-site, and click on “Start a manuscript,” then scroll all the way down and click on “Import manuscript”.

Upload the “blandy” file and you will see an “Authors’ page,” where you can select which of the authors mentioned in the metadata must be included as authors of the data paper itself. Note that the user of ARPHA uploading the metadata is added to the list of the authors even if they are not included in the metadata. After the selection is done, a scholarly article is created by the system with the information from the metadata already in the respective sections of the article:

Now, the authors can add some description, edit out errors, tell a story, cite someone – all of this without leaving ARPHA – i.e. do whatever it takes to produce a high-quality scholarly text. After they are done, they can submit their article for peer-review and it could be published in a matter of hours. Voila!

Let’s look at GBIF. Go to “Data -> Explore by country” and select “Saint Vincent and the Grenadines,” an English-speaking Caribbean island. There are, as of the time of writing of this post, 166 occurrence datasets containing data about the islands. Select the dataset from the Museum of Comparative Zoology at Harvard. If you scroll down, you will see the GBIF annotated EML. Download this as a separate text file (if you are using Chrome, you can view the source, and then use Copy-Paste). Do the exact same steps as before – go to “Import manuscript” in ARPHA and upload the EML file. The result should be something like this, ready to finalize:

To finish it up, we want to leave you with some caveats and topics for further discussion. Till today, useful and descriptive metadata has not always been present. There are two challenges: metadata completeness and metadata standards. The invention of the EML standard was one of the first efforts to standardize how metadata should be stored in the field of ecology and biodiversity science.

Currently, our import system supports the last two versions of the EML standard: 2.1.1 and 2.1.0, but we hope to further develop this functionality. In an upcoming version of their search interface, DataONE will provide infographics on the prevalence of the metadata standards on their site (as illustrated below), so there is still work to be done, but if there is a positive feedback from the community, we will definitely keep elaborating this feature.

Regarding metadata completeness, our hope is that by enabling scientists to create scholarly papers from their metadata with a single-step process, they will be incentivized to produce high-quality metadata.

Now, allow us to give a disclaimer here: the authors of this blog post have nothing to do with the two datasets. They have not contributed to any of them, nor do they know the authors. The datasets have been chosen more or less randomly since the authors wanted to demonstrate the functionality with a real-world example. You should only publish data papers if you know the authors or you are the author of the dataset itself. During the actual review process of the paper, the authors that have been included will get an email from the journal.

Additional information:

This project has received funding from the European Union’s FP7 project EU BON (Building the European Biodiversity Observation Network), grant agreement No 308454, and Horizon 2020 research and innovation project BIG4 (Biosystematics, informatics and genomics of the big 4 insect groups: training tomorrow’s researchers and entrepreneurs) under the Marie Sklodovska-Curie grant agreement No. 642241 for a PhD project titled Technological Implications of the Open Biodiversity Knowledge Management System.