The publications so far include the grant proposal; conference abstracts, a workshop report, guidelines papers and deliverables submitted to the Commission.
The dynamic open-science project collection of BiCIKL, titled “Towards interlinked FAIR biodiversity knowledge: The BiCIKL perspective” (doi: 10.3897/rio.coll.105), continues to grow, as the project progresses into its third year and its results accumulate ever so exponentially.
Following the publication of three important BiCIKL deliverables: the project’s Data Management Plan, its Visual identity package and a report, describing the newly built workflow and tools for data extraction, conversion and indexing and the user applications from OpenBiodiv, there are currently 30 research outcomes in the BiCIKL collection that have been shared publicly to the world, rather than merely submitted to the European Commission.
Shortly after the BiCIKL project started in 2021, a project-branded collection was launched in the open-science scholarly journal Research Ideas and Outcomes(RIO). There, the partners have been publishing – and thus preserving – conclusive research papers, as well as early and interim scientific outputs.
The publications so far also include the BiCIKL grant proposal, which earned the support of the European Commission in 2021; conference abstracts, submitted by the partners to two consecutive TDWG conferences; a project report that summarises recommendations on interoperability among infrastructures, as concluded from a hackathon organised by BiCIKL; and two Guidelines papers, aiming to trigger a culture change in the way data is shared, used and reused in the biodiversity field.
At the time of writing, the top three of the most read papers in the BiCIKL collection is completed by the grant proposal and the second Guidelines paper, where the partners – based on their extensive and versatile experience – present recommendations about the use of annotations and persistent identifiers in taxonomy and biodiversity publishing.
What one might find quite odd when browsing the BiCIKL collection is that each publication is marked with its own publication source, even though all contributions are clearly already accessible from RIO Journal.
This is because one of the unique features of RIOallows for consortia to use their project collection as a one-stop access point for all scientific results, regardless of their publication venue, by means of linking to the original source via metadata. Additionally, projects may also upload their documents in their original format and layout, thanks to the integration between RIO and ARPHA Preprints. This is in fact how BiCIKL chose to share their latest deliverables using the very same files they submitted to the Commission.
“In line with the mission of BiCIKL and our consortium’s dedication to FAIRness in science, we wanted to keep our project’s progress and results fully transparent and easily accessible and reusable to anyone, anywhere,”
explains Prof Lyubomir Penev, BiCIKL’s Project Coordinator and founder and CEO of Pensoft.
“This is why we opted to collate the outcomes of BiCIKL in one place – starting from the grant proposal itself, and then progressively adding workshop reports, recommendations, research papers and what not. By the time BiCIKL concludes, not only will we be ready to refer back to any step along the way that we have just walked together, but also rest assured that what we have achieved and learnt remains at the fingertips of those we have done it for and those who come after them,” he adds.
EIVE 1.0 is the most comprehensive system of ecological indicator values of vascular plants in Europe to date. It can be used as an important tool for continental-scale analyses of vegetation and floristic data.
It took seven years and hundreds of hours of work by an international team of 34 authors to develop and publish the most comprehensive system of ecological indicator values (EIVs) of vascular plants in Europe to date.
EIVE 1.0 provides the five most-used ecological indicators, M – moisture, N – nitrogen, R – reaction, L – light and T – temperature, for a total of 14,835 vascular plant taxa in Europe, or between 13,748 and 14,714 for the individual indicators. For each of these taxa, EIVE contains three values: the EIVE niche position indicator, the EIVE niche width indicator and the number of regional EIV systems on which the assessment was based. Both niche position and niche width are given on a continuous scale from 0 to 10, not as categorical ordinal values as in the source systems.
Evidently, EIVE can be an important tool for continental-scale analyses of vegetation and floristic data in Europe.
It will allow to analyse the nearly 2 million vegetation plots currently contained in the European Vegetation Archive (EVA; Chytrý et al. 2016) in new ways.
Since EVA apart from elevation, slope inclination and aspect hardly contains any in situ measured environmental variables, the numerous macroecological studies up to date had to rely on coarse modelled environmental data (e.g. climate) instead. This is particularly problematic for soil variables such as pH, moisture or nutrients, which can change dramatically within a few metres.
Here, the approximation of site conditions by mean ecological indicator values can improve the predictive power substantially (Scherrer and Guisan 2019). Likewise, in broad-scale vegetation classification studies, mean EIVE values per plot would allow a better characterisation of the distinguished vegetation units. Lastly, one should not forget that most countries in Europe do not have a national EIV system, and here EIVE could fill the gap.
Almost on the same day as EIVE 1.0 another supranational system of ecological indicator values in Europe has been published by Tichý et al. (2023) with a similar approach.
Thus, it will be important for vegetation scientists in Europe to understand the pros and cons of both systems to allow the wise selection of the most appropriate tool:
EIVE 1.0 is based on 31 regional EIV systems, while Tichý et al. (2023) uses 12.
Both systems provide indicator values for moisture, nitrogen/nutrients, reaction, light and temperature, while Tichý et al. (2023) additionally has a salinity indicator.
Tichý et al. (2023) aimed at using the same scales as Ellenberg et al. (1991), which means that the scales vary between indicators (1–9, 0–9, 1–12), while EIVE has a uniform interval scale of 0–10 for all indicators.
Only EIVE provides niche width in addition to niche position. Niche width is an important aspect of the niche and might be used to improve the calculation of mean indicator values per plot (e.g. by weighting with inverse niche width).
The taxonomic coverage is larger in EIVE than in Tichý et al. (2023): 14,835 vs. 8,908 accepted taxa and 11,148 vs. 8,679 species.
EIVE provides indicator values for accepted subspecies, while Tichý et al. (2023) is restricted to species and aggregates. Separate indicator values for subspecies might be important for two reasons: (a) subspecies often strongly differ in at least one niche dimension; (b) many of the taxa now considered as subspecies have been treated at species level in the regional EIV systems.
Tichý et al. (2023) added 431 species not contained in any of the source systems based on vegetation-plot data from the European Vegetation Archive (EVA; Chytrý et al. 2016) while EIVE calculated the European indicator values only for taxa occurring at least in one source system.
While both systems present maps that suggest a good coverage across Europe, Tichý et al. (2023)’s source systems largely were from Central Europe, NW Europe and Italy, but, unlike EIVE, these authors did not use source systems from the more “distal” parts of Europe, such as Sweden, Faroe Islands, Russia, Georgia, Romania, Poland and Spain, and they used only a small subset of indicators of the EIV systems of Ukraine, Greece and the Alps.
In a validation with GBIF-derived data on temperature niches, Dengler et al. (2023) showed that EIVE has a slightly stronger correlation than Tichý et al. (2023)’s indicators (r = 0.886 vs. 0.852).
How did EIVE manage to integrate all EIV systems in Europe that contained at least one of the selected indicators for vascular plants, while Tichý et al. (2023) used only a small subset?
This difference is mainly due to a more complex workflow in EIVE (which also was one of the reasons why the preparation took so long). First, Tichý et al. (2023) restricted their search to EIV systems and indicators that had the same number of categories as the “original” Ellenberg system.
Second, from these they discarded those that showed a too low correlation with Ellenberg. By contrast, EIVE’s workflow allowed the use of any system with an ordinal (or even metric) scale, irrespective of the number of categories or the initial match with Ellenberg et al. (1991).
EIVE also did not treat one system (Ellenberg) as the master to assess all others but considered each of them equally valid. While indeed the individual EIV systems are often quite inconsistent, i.e. even if they refer to Ellenberg, the same value of an indicator in one system might mean something different in another system, our iterative linear optimisation enabled us to adjust all 31 systems for the five indicators to a common basis.
This in turn allowed deriving EIVE as the consensus system of all the source systems. The fact that in our validation of the temperature indicator, EIVE performed better than Tichý et al. (2023) and much better than most of the regional EIV systems might be attributable to the so-called “wisdom of the crowd”, going back to the statistician Francis Galton who found that averaging numerous independent assessments (even by laymen) of a continuous quantity can leads to very good estimates of the true value.
Apart from the indicator values themselves, EIVE has a second main feature that might not be so obvious at first glance, but which actually took the EIVE team, including several taxonomists, more time than the workflow to generate the indicator values themselves: the taxonomic backbone. EIVE for vascular plants is fully based on the taxonomic concept (including the synonymic relationships) of the Euro+Med Plantbase.
However, since Euro+Med lacks an important part of taxa that are frequently recorded in vegetation plots, to make our backbone fully usable to vegetation science, we expanded it beyond Euro+Med to something called “Euro+Med augmented”. We particularly added hybrids, neophytes and aggregates, three groups of plants hitherto only very marginally covered in Euro+Med. All additions were done by experts consistently with the taxonomic concept of Euro+Med and are fully documented. Likewise, many additional synonym relationships had to be added that were missing in Euro+Med.
Finally, we implemented the so-called “concept synonymy” (see Jansen and Dengler 2010), which allows the assignment of the same name from different sources to different accepted names (“taxonomic concepts”). This applies mainly to nested taxa that are treated at different levels in different sources, e.g. once as species with several subspecies, once as aggregate with several species. However, there are also some cases of misapplied names (i.e. names that were not used in agreement with their nomenclatural type in certain EIV systems). Such cases generally cannot be solved by the various tools for automatic taxonomic cleaning, but require experts who make a case-by-case decision.
The whole taxonomic workflow of EIVE is fully transparent with an R code that “digests”:
(a) the names as they are in the source systems,
(b) the official Euro+Med database and
(c) tables that document our additions and modifications (with reasons and references).
This comprehensive documentation will allow continuous and efficient improvement in the future, be it because of taxonomic novelties adopted in Euro+Med or because EIVE’s experts decide to change certain interpretations. That way, “Euro+Med augmented” and the accompanying R-based workflow can also be a valuable tool for other projects that wish to harmonise plant taxonomic information from various sources at a continental scale, e.g. in vegetation-plot databases such as GrassPlot (Dengler et al. 2018) and EVA (Chytrý et al. 2016).
The publication of EIVE 1.0 is not the endpoint, but rather a starting point for future developments in a community-based approach.
Together with interested colleagues from outside, the EIVE core team plans to prepare better and more comprehensive releases of EIVE in the future, including updates to its taxonomic backbone.
Future releases of EIVE will be published in fixed versions, typically together with a paper that describes the changes in the content.
As steps for the next two years, we anticipate that we will first add further taxa (bryophytes, lichens, macroalgae) and some additional indicators, both of which are relatively easy with our established R-based workflow. Then we plan EIVE 2.0 that will use the approx. 2 million vegetation plots in EVA (Chytrý et al. 2016) to re-calibrate EIVE for all taxa (see http://euroveg.org/requests/EVA-data-request-form-2022-02-10-Dengleretal.pdf).
***
This Behind the paper post refers to the article Ecological Indicator Values for Europe (EIVE) 1.0 by Jürgen Dengler, Florian Jansen, Olha Chusova, Elisabeth Hüllbusch, Michael P. Nobis, Koenraad Van Meerbeek, Irena Axmanová, Hans Henrik Bruun, Milan Chytrý, Riccardo Guarino, Gerhard Karrer, Karlien Moeys, Thomas Raus, Manuel J. Steinbauer, Lubomir Tichý, Torbjörn Tyler, Ketevan Batsatsashvili, Claudia Bita-Nicolae, Yakiv Didukh, Martin Diekmann, Thorsten Englisch, Eduardo Fernandez Pascual, Dieter Frank, Ulrich Graf, Michal Hájek, Sven D. Jelaska, Borja Jiménez-Alfaro, Philippe Julve, George Nakhutsrishvili, Wim A. Ozinga, Eszter-Karolina Ruprecht, Urban Šilc, Jean-Paul Theurillat, and François Gillet published in Vegetation Classification and Survey (https://doi.org/10.3897/VCS.98324).
***
Follow the Vegetation Classification and Survey journal on Facebook and Twitter.
***
Brief personal summaries:
Jürgen Dengler is a Professor of Vegetation Ecology at the Zurich University of Applied Science (ZHAW) in Wädenswil, Switzerland. Among others, he cofounded the European Vegetation Database (EVA), the global vegetation-plot database “sPlot” and the “GrassPlot” database of the Eurasian Dry Grassland Group. His major research interests are grassland ecology, grassland conservation, biodiversity patterns, macroecology, vegetation change, broad-scale vegetation classification, methodological developments in vegetation ecology and ecoinformatics.
Florian Jansen is a Professor of Landscape Ecology at the University of Rostock, Germany. His research interests are vegetation ecology and dynamics, mire ecology including greenhouse gas emissions, and numerical ecology with R. He (co-)founded the German Vegetation Database vegetweb.de, the European Vegetation Database (EVA), and the global vegetation-plot database “sPlot”. He wrote the R package eHOF for modelling species response curves along one-dimensional ecological gradients.
François Gillet is an Emeritus Professor of Community Ecology at the University of Franche-Comté in Besançon, France. His major research interests are vegetation diversity, ecology and dynamics, grassland and forest ecology, integrated synusial phytosociology, numerical ecology with R, dynamic modelling of social-ecological systems.
***
References:
Chytrý, M., Hennekens, S.M., Jiménez-Alfaro, B., Knollová, I., Dengler, J., Jansen, F., Landucci, F., Schaminée, J.H.J., Aćić, S., (…) & Yamalov, S. 2016. European Vegetation Archive (EVA): an integrated database of European vegetation plots. Applied Vegetation Science 19: 173–180.
Dengler J, Wagner V, Dembicz I, García-Mijangos I, Naqinezhad A, Boch S, Chiarucci A, Conradi T, Filibeck G, … Biurrun I (2018) GrassPlot – a database of multi-scale plant diversity in Palaearctic grasslands. Phytocoenologia 48: 331–347.
Dengler, J., Jansen, F., Chusova, O., Hüllbusch, E., Nobis, M.P., Van Meerbeek, K., Axmanová, I., Bruun, H.H., Chytrý, M., (…) & Gillet, F. 2023. Ecological Indicator Values for Europe (EIVE) 1.0. Vegetation Classification and Survey 4: 7–29.
Ellenberg H, Weber HE, Düll R, Wirth V, Werner W, Paulißen D (1991) Zeigerwerte von Pflanzen in Mitteleuropa. Scripta Geobotanica 18: 1–248.
Jansen F, Dengler J (2010) Plant names in vegetation databases – a neglected source of bias. Journal of Vegetation Science 21: 1179–1186.
Midolo, G., Herben, T., Axmanová, I., Marcenò, C., Pätsch, R., Bruelheide, H., Karger, D.N., Acic, S., Bergamini, A., Bergmeier, E., Biurrun, I., Bonari, G., Carni, A., Chiarucci. A., De Sanctis, M., Demina, O., (…), Dengler, J., (…) & Chytrý, M. 2023. Disturbance indicator values for European plants. Global Ecology and Biogeography 32: 24–34.
Scherrer D, Guisan A (2019) Ecological indicator values reveal missing predictors of species distributions. Scientific Reports 9: Article 3061.
Tichý, L, Axmanová, I., Dengler, J., Guarino, R., Jansen, F., Midolo, G., Nobis, M.P., Van Meerbeek, K., Aćić, S., (…) & Chytrý, M. 2023. Ellenberg-type indicator values for European vascular plant species. Journal of Vegetation Science 34: e13168.
The FAIR Data Place – the key and final product of the partnership – is meant to provide scientists with all types of biodiversity data “at their fingertips”
The Horizon 2020 – funded project BiCIKL has reached its halfway stage and the partners gathered in Plovdiv (Bulgaria) from the 22nd to the 25th of October for the Second General Assembly, organised by Pensoft.
The BiCIKL project will launch a new European community of key research infrastructures, researchers, citizen scientists and other stakeholders in the biodiversity and life sciences based on open science practices through access to data, tools and services.
BiCIKL’s goal is to create a centralised place to connect all key biodiversity data by interlinking 15 research infrastructures and their databases. The 3-year European Commission-supported initiative kicked off in 2021 and involves 14 key natural history institutions from 10 European countries.
BiCIKL is keeping pace as expected with 16 out of the 48 final deliverables already submitted, another 9 currently in progress/under review and due in a few days. Meanwhile, 21 out of the 48 milestones have been successfully achieved.
The hybrid format of the meeting enabled a wider range of participants, which resulted in robust discussions on the next steps of the project, such as the implementation of additional technical features of the FAIR Data Place (FAIR being an abbreviation for Findable, Accessible, Interoperable and Reusable).
This data includes biodiversity information, such as detailed images, DNA, physiology and past studies concerning a specific species and its ‘relatives’, to name a few. Currently, the issue is that all those types of biodiversity data have so far been scattered across various databases, which in turn have been missing meaningful and efficient interconnectedness.
Additionally, the FAIR Data Place, developed within the BiCIKL project, is to give researchers access to plenty of training modules to guide them through the different services.
Halfway through the duration of BiCIKL, the project is at a turning point, where crucial discussions between the partners are playing a central role in the refinement of the FAIR Data Place design. Most importantly, they are tasked with ensuring that their technologies work efficiently with each other, in order to seamlessly exchange, update and share the biodiversity data every one of them is collecting and taking care of.
By Year 3 of the BiCIKL project, the partners agree, when those infrastructures and databases become efficiently interconnected to each other, scientists studying the Earth’s biodiversity across the world will be in a much better position to build on existing research and improve the way and the pace at which nature is being explored and understood. At the end of the day, knowledge is the stepping stone for the preservation of biodiversity and humankind itself.
“Needless to say, it’s an honour and a pleasure to be the coordinator of such an amazing team spanning as many as 14 partnering natural history and biodiversity research institutions from across Europe, but also involving many global long-year collaborators and their infrastructures, such as Wikidata, GBIF, TDWG, Catalogue of Life to name a few,”
said BiCIKL’s project coordinator Prof. Lyubomir Penev, CEO and founder of Pensoft.
“The point is: do we want an integrated structure or do we prefer federated structures? What are the pros and cons of the two options? It’s essential to keep the community united and allied because we can’t afford any information loss and the stakeholders should feel at home with the Project and the Biodiversity Knowledge Hub.”
Joe Miller, Executive Secretary and Director at GBIF, commented:
“We are a brand new community, and we are in the middle of the growth process. We would like to already have answers, but it’s good to have this kind of robust discussion to build on a good basis. We must find the best solution to have linkages between infrastructures and be able to maintain them in the future because the Biodiversity Knowledge Hub is the location to gather the community around best practices, data and guidelines on how to use the BiCIKL services… In order to engage even more partners to fill the eventual gaps in our knowledge.”
“In an era of biodiversity change and loss, leveraging scientific data fully will allow the world to catalogue what we have now, to track and understand how things are changing and to build the tools that we will use to conserve or remediate. The challenge is that the data come from many streams – molecular biology, taxonomy, natural history collections, biodiversity observation – that need to be connected and intersected to allow scientists and others to ask real questions about the data. In its first year, BiCIKL has made some key advances to rise to this challenge,”
“As a partner, we, at the Biodiversity Information Standards – TDWG, are very enthusiastic that our standards are implemented in BiCIKL and serve to link biodiversity data. We know that joining forces and working together is crucial to building efficient infrastructures and sharing knowledge.”
The project will go on with the first Round Table of experts in December and the publications of the projects who participated in the Open Call and will be founded at the beginning of the next year.
***
Learn more about BiCIKL on the project’s website at: bicikl-project.eu
The purpose of this call is to solicit, select and implement four to six biodiversity data-related scientific projects that will make use of the added value services developed by the leading Research Infrastructures that make the BiCIKL project.
The BiCIKL project invites submissions of Expression of Interest (EoI) to the First BiCIKL Open Call for projects. The purpose of this call is to solicit, select and implement four to six biodiversity data-related scientific projects that will make use of the added value services developed by the leading Research Infrastructures that make the BiCIKL project.
By opening this call, BiCIKL aims to better understand how it could support scientific questions that arise from across the biodiversity world in the future, while addressing specific scientific or technical biodiversity data challenges presented by the applicants.
The BiCIKL project – a Horizon 2020-funded project involving 14 European institutions, representing major global players in biodiversity research and natural history, and coordinated by Pensoft – establishes a European starting community of key research infrastructures, researchers, citizen scientists and other biodiversity and life sciences stakeholders based on open science practices through access to data, tools and services.
Within Biodiversity Community Integrated Knowledge Library (BiCIKL), 14 key research and natural history institutions commit to link infrastructures and technologies to provide flawless access to biodiversity data.
In a recently started Horizon 2020-funded project, 14 European institutions from 10 countries, representing both the continent’s and global key players in biodiversity research and natural history, deploy and improve their own and partnering infrastructures to bridge gaps between each other’s biodiversity data types and classes. By linking their technologies, they are set to provide flawless access to data across all stages of the research cycle.
Three years in, BiCIKL (abbreviation for Biodiversity Community Integrated Knowledge Library) will have created the first-of-its-kind Biodiversity Knowledge Hub, where a researcher will be able to retrieve a full set of linked and open biodiversity data, thereby accessing the complete story behind an organism of interest: its name, genetics, occurrences, natural history, as well as authors and publications mentioning any of those.
Ultimately, the project’s products will solidify Open Science and FAIR (Findable, Accessible, Interoperable and Reusable) data practices by empowering and streamlining biodiversity research.
Together, the project partners will redesign the way biodiversity data is found, linked, integrated and re-used across the research cycle. By the end of the project, BiCIKL will provide the community with a more transparent, trustworthy and efficient highly automated research ecosystem, allowing for scientists to access, explore and put into further use a wide range of data with only a few clicks.
Continuously fed with data sourced by the partnering institutions and their infrastructures, BiCIKL’s key final output: the Biodiversity Knowledge Hub, is set to persist with time long after the project has concluded. On the contrary, by accelerating biodiversity research that builds on – rather than duplicates – existing knowledge, it will in fact be providing access to exponentially growing contextualised biodiversity data.
***
Learn more about BiCIKL on the project’s website at: bicikl-project.eu
From 1973 to 2020, Australian zoologist Dr Robert Mesibov kept careful records of the “where” and “when” of his plant and invertebrate collecting trips. Now, he has made those valuable biodiversity data freely and easily accessible via the Zenodo open-data repository, so that future researchers can rely on this “authority file” when using museum specimens collected from those events in their own studies. The new dataset is described in the open-access, peer-reviewed Biodiversity Data Journal.
While checking museum records, Dr Robert Mesibov found there were occasional errors in the dates and places for specimens he had collected many years before. He was not surprised.
One solution to this problem was what librarians and others have long called an “authority file”.
“It’s an authoritative reference, in this case with the correct details of where I collected and when”, he explained.
“I kept records of almost all my collecting trips from 1973 until I retired from field work in 2020. The earliest records were on paper, but I began storing the key details in digital form in the 1990s.”
The 48-year record has now been made publicly available via the Zenodo open-data repository after conversion to the Darwin Core data format, which is widely used for sharing biodiversity information. With this “authority file”, described in detail in the open-access, peer-reviewed Biodiversity Data Journal, future researchers will be able to rely on sound, interoperable and easy to access data, when using those museum specimens in their own studies, instead of repeating and further spreading unintentional errors.
“There are 3829 collecting events in the authority file”, said Mesibov, “from six Australian states and territories. For each collecting event there are geospatial and date details, plus notes on the collection.”
Mesibov hopes the authority file will be used by museums to correct errors in their catalogues.
“It should also save museums a fair bit of work in future”, he explained. “No need to transcribe details on specimen labels into digital form in a database, because the details are already in digital form in the authority file.”
Mesibov points out that in the 19th and 20th centuries, lists of collecting events were often included in the reports of major scientific expeditions.
“Those lists were authority files, but in the pre-digital days it was probably just as easy to copy collection data from specimen labels.”
“Authority files for collecting events are the next logical step,” said Mesibov. “They can be used as lookup tables for all the important details of individual collections: where, when, by whom and how.”
###
Research paper:
Mesibov RE (2021) An Australian collector’s authority file, 1973–2020. Biodiversity Data Journal 9: e70463. https://doi.org/10.3897/BDJ.9.e70463
Between now and 15 September 2021, the article processing fee (normally €550) will be waived for the first 36 papers, provided that the publications are accepted and meet the following criteria that the data paper describes a dataset:
The manuscript must be prepared in English and is submitted in accordance with BDJ’s instructions to authors by 15 September 2021. Late submissions will not be eligible for APC waivers.
Sponsorship is limited to the first 36 accepted submissions meeting these criteria on a first-come, first-served basis. The call for submissions can therefore close prior to the stated deadline of 15 September 2021. Authors may contribute to more than one manuscript, but artificial division of the logically uniform data and data stories, or “salami publishing”, is not allowed.
BDJ will publish a special issue including the selected papers by the end of 2021. The journal is indexed by Web of Science (Impact Factor 1.331), Scopus (CiteScore: 2.1) and listed in РИНЦ / eLibrary.ru.
For non-native speakers, please ensure that your English is checked either by native speakers or by professional English-language editors prior to submission. You may credit these individuals as a “Contributor” through the AWT interface. Contributors are not listed as co-authors but can help you improve your manuscripts.
In addition to the BDJ instruction to authors, it is required that datasets referenced from the data paper a) cite the dataset’s DOI, b) appear in the paper’s list of references, and c) has “Russia 2021” in Project Data: Title and “N-Eurasia-Russia2021“ in Project Data: Identifier in the dataset’s metadata.
Questions may be directed either to Dmitry Schigel, GBIF scientific officer, or Yasen Mutafchiev, managing editor of Biodiversity Data Journal.
The 2021 extension of the collection of data papers will be edited by Vladimir Blagoderov, Pedro Cardoso, Ivan Chadin, Nina Filippova, Alexander Sennikov, Alexey Seregin, and Dmitry Schigel.
Datasets with more than 5,000 records that are new to GBIF.org
Datasets should contain at a minimum 5,000 new records that are new to GBIF.org. While the focus is on additional records for the region, records already published in GBIF may meet the criteria of ‘new’ if they are substantially improved, particularly through the addition of georeferenced locations.” Artificial reduction of records from otherwise uniform datasets to the necessary minimum (“salami publishing”) is discouraged and may result in rejection of the manuscript. New submissions describing updates of datasets, already presented in earlier published data papers will not be sponsored.
Justification for publishing datasets with fewer records (e.g. sampling-event datasets, sequence-based data, checklists with endemics etc.) will be considered on a case-by-case basis.
Datasets with high-quality data and metadata
Authors should start by publishing a dataset comprised of data and metadata that meets GBIF’s stated data quality requirement. This effort will involve work on an installation of the GBIF Integrated Publishing Toolkit.
Only when the dataset is prepared should authors then turn to working on the manuscript text. The extended metadata you enter in the IPT while describing your dataset can be converted into manuscript with a single-click of a button in the ARPHA Writing Tool (see also Creation and Publication of Data Papers from Ecological Metadata Language (EML) Metadata. Authors can then complete, edit and submit manuscripts to BDJ for review.
Datasets with geographic coverage in Russia
In correspondence with the funding priorities of this programme, at least 80% of the records in a dataset should have coordinates that fall within the priority area of Russia. However, authors of the paper may be affiliated with institutions anywhere in the world.
***
Check out the Biota of Russia dynamic data paper collection so far.
Follow Biodiversity Data Journal on Twitter and Facebook to keep yourself posted about the new research published.
by Mariya Dimitrova, Jorrit Poelen, Georgi Zhelezov, Teodor Georgiev, Lyubomir Penev
Tables published in scholarly literature are a rich source of primary biodiversity data. They are often used for communicating species occurrence data, morphological characteristics of specimens, links of species or specimens to particular genes, ecology data and biotic interactions between species, etc. Tables provide a structured format for sharing numerous facts about biodiversity in a concise and clear way.
Inspired by the potential use of semantically-enhanced tables for text and data mining, Pensoft and Global Biotic Interactions (GloBI) developed a workflow for extracting and indexing biotic interactions from tables published in scholarly literature. GloBI is an open infrastructure enabling the discovery and sharing of species interaction data. GloBI ingests and accumulates individual datasets containing biotic interactions and standardises them by mapping them to community-accepted ontologies, vocabularies and taxonomies. Data integrated by GloBI is accessible through an application programming interface (API) and as archives in different formats (e.g. n-quads). GloBI has indexed millions of species interactions from hundreds of existing datasets spanning over a hundred thousand taxa.
The workflow
First, all tables extracted from Pensoft publications and stored in the OpenBiodiv triple store were automatically retrieved (Step 1 in Fig. 1). There were 6993 tables from 21 different journals. To identify only the tables containing biotic interactions, we used an ontology annotator, currently developed by Pensoft using terms from the OBO Relation Ontology (RO). The Pensoft Annotator analyses free text and finds words and phrases matching ontology term labels.
We used the RO to create a custom ontology, or list of terms, describing different biotic interactions (e.g. ‘host of’, ‘parasite of’, ‘pollinates’) (Step 2 in Fig. 1).. We used all subproperties of the RO term labeled ‘biotically interacts with’ and expanded the list of terms with additional word spellings and variations (e.g. ‘hostof’, ‘host’) which were added to the custom ontology as synonyms of already existing terms using the property oboInOwl:hasExactSynonym.
This custom ontology was used to perform annotation of all tables via the Pensoft Annotator (Step 3 in Fig. 1). Tables were split into rows and columns and accompanying table metadata (captions). Each of these elements was then processed through the Pensoft Annotator and if a match from the custom ontology was found, the resulting annotation was written to a MongoDB database, together with the article metadata. The original table in XML format, containing marked-up taxa, was also stored in the records.
Thus, we detected 233 tables which contain biotic interactions, constituting about 3.4% of all examined tables. The scripts used for parsing the tables and annotating them, together with the custom ontology, are open source and available on GitHub. The database records were exported as json to a GitHub repository, from where they could be accessed by GloBI.
GloBI processed the tables further, involving the generation of a table citation from the article metadata and the extraction of interactions between species from the table rows (Step 4 in Fig. 1). Table citations were generated by querying the OpenBiodiv database with the DOI of the article containing each table to obtain the author list, article title, journal name and publication year. The extraction of table contents was not a straightforward process because tables do not follow a single schema and can contain both merged rows and columns (signified using the ‘rowspan’ and ‘colspan’ attributes in the XML). GloBI were able to index such tables by duplicating rows and columns where needed to be able to extract the biotic interactions within them. Taxonomic name markup allowed GloBI to identify the taxonomic names of species participating in the interactions. However, the underlying interaction could not be established for each table without introducing false positives due to the complicated table structures which do not specify the directionality of the interaction. Hence, for now, interactions are only of the type ‘biotically interacts with’ (Fig. 2) because it is a bi-directional one (e.g. ‘Species A interacts with Species B’ is equivalent to ‘Species B interacts with Species A’).
Examples of species interactions provided by OpenBiodiv and indexed by GloBI are available on GloBI’s website.
In the future we plan to expand the capacity of the workflow to recognise interaction types in more detail. This could be implemented by applying part of speech tagging to establish the subject and object of an interaction.
In addition to being accessible via an API and as archives, biotic interactions indexed by GloBI are available as Linked Open Data and can be accessed via a SPARQL endpoint. Hence, we plan on creating a user-friendly service for federated querying of GloBI and OpenBiodiv biodiversity data.
This collaborative project is an example of the benefits of open and FAIR data, enabling the enhancement of biodiversity data through the integration between Pensoft and GloBI. Transformation of knowledge contained in existing scholarly works into giant, searchable knowledge graphs increases the visibility and attributed re-use of scientific publications.
Tables published in scholarly literature are a rich source of primary biodiversity data. They are often used for communicating species occurrence data, morphological characteristics of specimens, links of species or specimens to particular genes, ecology data and biotic interactions between species etc. Tables provide a structured format for sharing numerous facts about biodiversity in a concise and clear way.
Inspired by the potential use of semantically-enhanced tables for text and data mining, Pensoft and Global Biotic Interactions (GloBI) developed a workflow for extracting and indexing biotic interactions from tables published in scholarly literature. GloBI is an open infrastructure enabling the discovery and sharing of species interaction data. GloBI ingests and accumulates individual datasets containing biotic interactions and standardises them by mapping them to community-accepted ontologies, vocabularies and taxonomies. Data integrated by GloBI is accessible through an application programming interface (API) and as archives in different formats (e.g. n-quads). GloBI has indexed millions of species interactions from hundreds of existing datasets spanning over a hundred thousand taxa.
The workflow
First, all tables extracted from Pensoft publications and stored in the OpenBiodiv triple store were automatically retrieved (Step 1 in Fig. 1). There were 6,993 tables from 21 different journals. To identify only the tables containing biotic interactions, we used an ontology annotator, currently developed by Pensoft using terms from the OBO Relation Ontology (RO). The Pensoft Annotator analyses free text and finds words and phrases matching ontology term labels.
We used the RO to create a custom ontology, or list of terms, describing different biotic interactions (e.g. ‘host of’, ‘parasite of’, ‘pollinates’) (Step 1 in Fig. 1). We used all subproperties of the RO term labeled ‘biotically interacts with’ and expanded the list of terms with additional word spellings and variations (e.g. ‘hostof’, ‘host’) which were added to the custom ontology as synonyms of already existing terms using the property oboInOwl:hasExactSynonym.
This custom ontology was used to perform annotation of all tables via the Pensoft Annotator (Step 3 in Fig. 1). Tables were split into rows and columns and accompanying table metadata (captions). Each of these elements was then processed through the Pensoft Annotator and if a match from the custom ontology was found, the resulting annotation was written to a MongoDB database, together with the article metadata. The original table in XML format, containing marked-up taxa, was also stored in the records.
Thus, we detected 233 tables which contain biotic interactions, constituting about 3.4% of all examined tables. The scripts used for parsing the tables and annotating them, together with the custom ontology, are open source and available on GitHub. The database records were exported as JSON to a GitHub repository, from where they could be accessed by GloBI.
GloBI processed the tables further, involving the generation of a table citation from the article metadata and the extraction of interactions between species from the table rows (Step 4 in Fig. 1). Table citations were generated by querying the OpenBiodiv database with the DOI of the article containing each table to obtain the author list, article title, journal name and publication year. The extraction of table contents was not a straightforward process because tables do not follow a single schema and can contain both merged rows and columns (signified using the ‘rowspan’ and ‘colspan’ attributes in the XML). GloBI were able to index such tables by duplicating rows and columns where needed to be able to extract the biotic interactions within them. Taxonomic name markup allowed GloBI to identify the taxonomic names of species participating in the interactions. However, the underlying interaction could not be established for each table without introducing false positives due to the complicated table structures which do not specify the directionality of the interaction. Hence, for now, interactions are only of the type ‘biotically interacts with’ because it is a bi-directional one (e.g. ‘Species A interacts with Species B’ is equivalent to ‘Species B interacts with Species A’).
In the future, we plan to expand the capacity of the workflow to recognise interaction types in more detail. This could be implemented by applying part of speech tagging to establish the subject and object of an interaction.
In addition to being accessible via an API and as archives, biotic interactions indexed by GloBI are available as Linked Open Data and can be accessed via a SPARQL endpoint. Hence, we plan on creating a user-friendly service for federated querying of GloBI and OpenBiodiv biodiversity data.
This collaborative project is an example of the benefits of open and FAIR data, enabling the enhancement of biodiversity data through the integration between Pensoft and GloBI. Transformation of knowledge contained in existing scholarly works into giant, searchable knowledge graphs increases the visibility and attributed re-use of scientific publications.
References
Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.
Additional Information
The work has been partially supported by the International Training Network (ITN) IGNITE funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 764840.
Looking at today’s ravaging COVID-19 (Coronavirus) pandemic, which, at the time of writing, has spread to over 220 countries; its continuously rising death toll and widespread fear, on the outside, it may feel like scientists and decision-makers are scratching their heads more than ever in the face of the unknown. In reality, however, we get to witness an unprecedented global community gradually waking up to the realisation of the only possible solution: collaboration.
On one hand, we have nationwide collective actions, including cancelled travel plans and mass gatherings; social distancing; and lockdowns, that have already proved successful at changing what the World Health Organisation (WHO) has determined as “the course of a rapidly escalating and deadly epidemic” in Hong Kong, Singapore and China. On the other hand, we have the world’s best scientists and laboratories all steering their expertise and resources towards the better understanding of the virus and, ultimately, developing a vaccine for mass production as quickly as possible.
While there is little doubt that the best specialists in the world will eventually invent an efficient vaccine – just like they did following the Western African Ebola virus epidemic (2013–2016) and on several other similar occasions in the years before – the question at hand is rather when this is going to happen and how many human lives it is going to cost?
Again, it all comes down to collective efforts. It only makes sense that if research teams and labs around the globe join their efforts and expertise, thereby avoiding duplicate work, their endeavours will bear fruit sooner rather than later. Similarly to employees from across the world, who have been demonstrating their ability to perform their day-to-day tasks and responsibilities from the safety of their homes just as efficiently as they would have done from their conventional offices, in today’s high-tech, online-friendly reality, no more should scientists be restricted by physical and geographical barriers either.
“Observations, prevention and impact of COVID-19”: Special Collection in RIO Journal
To inspire and facilitate collaboration across the world, the SPARC-recognised Open Science innovator Research Ideas and Outcomes(RIO Journal) decided to bring together scientific findings in an easy to discover, read, cite and build on collection of publications.
Furthermore, due to its revolutionary approach to publishing, where early and brief research outcomes (i.e. ideas, raw data, software descriptions, posters, presentations, case studies and many others) are all considered as precious scientific gems, hence deserving a formal publication in a renowned academic journal, RIO places a special focus on these contributions.
Accepted manuscripts that shall deal with research relevant to the COVID-19 pandemic across disciplines, including medicine, ethics, politics, economics etc. at a local, regional, national or international scale; and also meant to encourage crucial discussions, will be published free of charge in recognition of the emergency of the current situation. Especially encouraged are submissions focused on the long-term effects of COVID-19.
Furthermore, thanks to the technologically advanced infrastructure and services it provides, in addition to a long list of indexers and databases where publications are registered, the manuscripts submitted to RIO Journal are not only rapidly processed and published, but once they get online, they immediately become easy to discover, cite and built on by any researcher, anywhere in the world.
On top of that, Pensoft’s targeted and manually provided science communication services make sure that published research of social value reaches the wider audience, including key decision-makers and journalists, by means of press releases and social media promotion.
***
More info about RIO’s globally unique features, visit the journal’s website. Follow RIO Journal on Twitter and Facebook.
Between now and 31 August 2020, the article processing fee (normally €450) will be waived for the first 20 papers, provided that the publications are accepted and meet the following criteria that the data paper describes a dataset:
The manuscript must be prepared in English and is submitted in accordance with BDJ’s instructions to authors by 31 August 2020. Late submissions will not be eligible for APC waivers.
Sponsorship is limited to the first 20 accepted submissions meeting these criteria on a first-come, first-served basis. The call for submissions can therefore close prior to the stated deadline of 31 August. Authors may contribute to more than one manuscript, but artificial division of the logically uniform data and data stories, or “salami publishing”, is not allowed.
BDJ will publish a special issue including the selected papers by the end of 2020. The journal is indexed by Web of Science (Impact Factor 1.029), Scopus (CiteScore: 1.24) and listed in РИНЦ / eLibrary.ru
For non-native speakers, please ensure that your English is checked either by native speakers or by professional English-language editors prior to submission. You may credit these individuals as a “Contributor” through the AWT interface. Contributors are not listed as co-authors but can help you improve your manuscripts.
In addition to the BDJ instruction to authors, it is required that datasets referenced from the data paper a) cite the dataset’s DOI and b) appear in the paper’s list of references.
Questions may be directed either to Dmitry Schigel, GBIF scientific officer, or Yasen Mutafchiev, managing editor of Biodiversity Data Journal.
Definition of terms
Datasets with more than 5,000 records that are new to GBIF.org
Datasets should contain at a minimum 5,000 new records that are new to GBIF.org. While the focus is on additional records for the region, records already published in GBIF may meet the criteria of ‘new’ if they are substantially improved, particularly through the addition of georeferenced locations.
Justification for publishing datasets with fewer records (e.g. sampling-event datasets, sequence-based data, checklists with endemics etc.) will be considered on a case-by-case basis.
Datasets with high-quality data and metadata
Authors should start by publishing a dataset comprised of data and metadata that meets GBIF’s stated data quality requirement. This effort will involve work on an installation of the GBIF Integrated Publishing Toolkit.
Only when the dataset is prepared should authors then turn to working on the manuscript text. The extended metadata you enter in the IPT while describing your dataset can be converted into manuscript with a single-click of a button in the ARPHA Writing Tool (see also Creation and Publication of Data Papers from Ecological Metadata Language (EML) Metadata. Authors can then complete, edit and submit manuscripts to BDJ for review.
Datasets with geographic coverage in European Russia west of the Ural mountains
In correspondence with the funding priorities of this programme, at least 80% of the records in a dataset should have coordinates that fall within the priority area of European Russia west of the Ural mountains. However, authors of the paper may be affiliated with institutions anywhere in the world.
#####
Data audit at Pensoft’s biodiversity journals
Data papers submitted to Biodiversity Data Journal, as well as all relevant biodiversity-themed journals in Pensoft’s portfolio, undergo a mandatory data auditing workflow before being passed down to a subject editor.
Check out the case study below to see how the data audit workflow works in practice.