As part of the Horizon Europe consortium, Pensoft will contribute with services and know-how in scholarly publishing and project branding.
Pensoft takes on an integral part in the newly launched EU-funded project: Intellectual Properties for Open Science: Pensoft (IP4OS) as a leader of the Work Package 5: Knowledge transfer: Communication, Disseminationand Exploitation of project results and Sustainability
IP4OS focuses on the integration of Intellectual Property and Open Science to empower professionals across Europe in making research outputs more accessible and impactful.
The IP4OS project officially started with a kick-off symposium on 8-9 January 2025, in Kiel, Germany.
Over 50 participants from diverse fields came together for the first day of the event, which featured talks and discussions focused on the intersection between Intellectual Property and Open Science. The second day saw presentations of the work packages that mapped out the project’s activities over the next two years through a collaborative exchange of ideas.
The Project
IP4OS aims to promote a practical connection between Intellectual Propertymanagement and Open Scienceprinciples.
The project has outlined several objectives to reach this goal, including:
Best-practice manual: IP4OS is to release a guide with actionable steps for integrating Intellectual Property and Open Science principles effectively.
Raise awareness: the consortium is to inform about the use of Intellectual Property tools in the context of Open Science practices among key professional groups.
Professional training: the project is to deliver educational programs to a broad audience and equip participants with practical knowledge and skills.
Collaborative community: IP4OS is to engage professionals across Europe to create a network of individuals and organisations focused on the improvement of knowledge-sharing practices.
These goals are aligned with the European Commission’s vision to strengthenknowledge-sharing practices for societal and economic advancement.
Pensoft’s role in IP4OS
As the leader of Work Package 5, Pensoft is responsible for amplifying the visibility and long-term impact of the IP4OS project.
Key activities under this work package include:
Distinctive brand identity: Pensoft will create a project logo, branding guidelines, promotional content, and a website to serve as a hub for project content and updates.
Communication and dissemination strategy: Pensoft will prepare a detailed plan for sharing project results amongst key stakeholders and audiences. The plan will be implemented during the early stages of the project.
Project outcomes visibility: Pensoft will produce key informational materials, including the best-practice manual and educational resources. These will be shared through platforms like the European Open Science Cloud (EOSC) and Knowledge Valorisation Platform to extend the project’s reach.
Stakeholder engagement: Pensoft will deliver content such as videos, press releases and newsletters to communicate the project’s progress and results to a wide audience.
These efforts, among others, aim to make the project results widely accessible and reusable by all relevant groups within and beyond the research community.
International Consortium
The project brings together nine international partners from eight countries operating in various sectors, ultimately contributing with diverse expertise:
Together, the consortium is committed to addressing the challenges of integrating Intellectual Property and Open Science practices.
Over the coming months, the IP4OS project will focus on developing resources to support professionals in advancing the use of Intellectual Property and Open Science practices.
The IP4OS project website is coming soon!
In the meantime, make sure to follow the project’s progress by following our social media channels on BlueSky and LinkedIn.
Yet another hectic year has passed for our team at Pensoft, so it feels right to look back at the highlights from the last 12 months, as we buckle up for the leaps and strides in 2025.
In the past, we have used the occasion to take you back to the best moments of our most popular journals (see this list of 2023 highlights from ZooKeys, MycoKeys, PhytoKeys and more!); share milestones related to our ARPHA publishing platform (see the new journals, integrations and features from 2023); or let you reminisce about the coolest research published across our journals during the year(check out our Top 10 new species from 2021).
In 2022, when we celebrated our 30th anniversary on the academic scene, we extended our festive spirit throughout the year as we dived deep into those fantastic three decades. We put up Pensoft’s timeline and finished the year with a New Species Showdown tournament, where our followers on (what was back then) Twitter voted twice a week for their favourite species EVER described on the pages of our taxonomic journals.
Spoiler alert: we will be releasing our 2024 Top 10 New Species on Monday, 23 December, so you’d better go to the right of this screen and subscribe to our blog!
As we realised we might’ve been a bit biased towards our publishing activities over the years, this time, hereby, we chose to present you a retrospection that captures our best 2024 moments from across the departments, and shed light on how the publishing, technology and project communication endeavours fit together to make Pensoft what it is.
In truth, we take pride in being an exponentially growing family of multiple departments that currently comprises over 60 full-time employees and about a dozen freelancers working from all corners of the world, including Australia, Canada, Belgium and the United Kingdom. Together, we are all determined to make sure we continuously improve our service to all who have trusted us: authors, reviewers, editors, client journals, learned societies, research institutions, project consortia and other external collaborators.
Pensoft as an open-access academic publisher
In 2024, at Pensoft, we were hugely pleased to see a significant growth in the published output at almost all our journals, including record-breaking numbers in both submissions and publications at flagship titles of ours, including the Biodiversity Data Journal, PhytoKeys and MycoKeys.
Later in 2024, our colleagues, who work together with our clients to ensure their journals comply with the requirements of the top scholarly databases before they apply for indexation, informed us that another two journals in our portfolio have had their applications to Clarivate’s Web of Science successfully accepted. These are the newest journal of the International Association of Vegetation Science: Vegetation and Classification, and Metabarcoding and Metagenomics: a journal we launched in 2017 in collaboration with a team of brilliant scientists working together at the time within the DNAquaNet COST Action.
In 2024, we also joined the celebrations of our long-time partners at the Museum für Naturkunde Berlin, whose three journals: Zoosystematics and Evolution, Deutsche Entomologische Zeitschrift and Fossil Record are all part of our journal portfolio. This year marked the 10th Open Access anniversary of the three journals.
In the meantime, we also registered a record in new titles either joining the Pensoft portfolio or opting for ARPHA Platform’s white-label publishing solution, where journal owners retain exclusivity for the publication of their titles, yet use ARPHA’s end-to-end technology and as many human-provided services as necessary.
Amongst our new partners are the International Mycological Association who moved their official journal IMA Fungus to ARPHA Platform. As part of Pensoft’s scholarly portfolio, the renowned journal joins another well-known academic title in the field of mycology: MycoKeys, which was launched by Pensoft in 2011. The big announcement was aptly made public at this year’s 12th International Mycological Congress where visitors of the Pensoft stand could often spot newly elected IMA President and IMA Fungus Chief editor: Marc Stadler chatting with our founder and CEO Lyubomir Penev by the Pensoft/MycoKeys booth.
On our end, we did not stop supporting enthusiastic and proactive scientists in their attempt to bridge gaps in scientific knowledge. In January, we launched the Estuarine Management and Technologies journal together with Dr. Soufiane Haddout of the Ibn Tofail University, Morocco.
Later on, Dr. Franco Andreone (Museo Regionale di Scienze Naturali, Italy) sought us with the idea to launch a journal addressing the role of natural history museums and herbaria collections in scientific progress. This collaboration resulted in the Natural History Collections and Museomics journal, officially announced at the joint TDWG-SPNHC conference in Okinawa, Japan in August.
Around this time, we finalised our similarly exciting journal project in partnership with Prof. Dr. Volker Grimm (UFZ, Germany), Prof. Dr. Karin Frank (UFZ, Germany), Prof. Dr. Mark E. Hauber (City University of New York) and Prof. Dr. Florian Jeltsch (University of Potsdam, Germany). The outcome of this collaboration is called Individual-based Ecology: a journal that aims to promote an individual-based perspective in ecology, as it closes the knowledge gap between individual-level responses and broader ecological patterns.
The three newly-launched journals are all published under the Diamond Open Access model, where neither access, nor publication is subject to charges.
As you can see, we have a lot to be proud of in terms of our journals. This is also why in 2024 our team took a record number of trips to attend major scientific events, where we got the chance to meet face-to-face with long-time editors, authors, reviewers and readers of our journals. Even more exciting was meeting the new faces of scientific research and learning about their own take on scholarship and academic journals.
We cannot possibly comment on Pensoft’s tech progress in 2024 without mentioning the EU-funded project BiCIKL (acronym for Biodiversity Community Integrated Knowledge Library) that we coordinated for three years ending up last April.
This 36-month endeavour saw 14 member institutions and 15 research infrastructures representing diverse actors from the biodiversity data realm come together to improve bi-directional links between different platforms, standards, formats and scientific fields.
Following these three years of collaborative work, we reported a great many notable research outputs from our consortium (find about them in the open-science project collection in the Research Ideas and Outcomes journal, titled “Towards interlinked FAIR biodiversity knowledge: The BiCIKL perspective”) that culminated in the Biodiversity Knowledge Hub: a one-stop portal that allows users to access FAIR and interlinked biodiversity data and services in a few clicks; and also a set of policy recommendations addressing key policy makers, research institutions and funders who deal with various types of data about the world’s biodiversity, and are thereby responsible to ensuring there findability, accessibility, interoperability and reusability (FAIR-ness).
The Biodiversity Knowledge Hub
Apart from coordinating BiCIKL, we also worked side-by-side with our partners to develop, refine and test each other’s tools and services, in order to make sure that they communicate efficiently with each other, thereby aligning with the principles of FAIR data and the needs of the scientific community in the long run.
During those three years we made a lot of refinements to our OpenBiodiv: a biodiversity database containing knowledge extracted from scientific literature, built as an Open Biodiversity Knowledge Management System, and our ARPHA Writing Tool. The latter is an XML-based online authoring environment using a large set of pre-formatted templates, where manuscripts are collaboratively written, edited and submitted to participating journals published on ARPHA Platform. What makes the tool particularly special is its multiple features that streamline and FAIRify data publishing as part of a scientific publication, especially in the field of biodiversity knowledge. In fact, we made enough improvements to the ARPHA Writing Tool that we will be soon officially releasing its 2.0 version!
OpenBiodiv – The Open Biodiversity Knowledge Management System
ARPHA Writing Tool 2.0
Amongst our collaborative projects are the Nanopublications for Biodiversity workflow that we co-developed with KnowledgePixels to allow researchers to ‘fragment’ their most important scientific findings into machine-actionable and machine-interpretable statements. Being the smallest units of publishable information, these ‘pixels of knowledge’ present an assertion about anything that can be uniquely identified and attributed to its author and serve to communicate a single statement, its original source (provenance) and citation record (publication info).
Nanopublications for Biodiversity
In partnership with the Swiss-based Text Mining group of Patrick Ruch at SIB and the text- and data-mining association Plazi, we brought the SIB Literature Services (SIBiLS) database one step closer to solidifying its “Biodiversity PMC” portal and working title.
Understandably, we spent a lot of effort, time and enthusiasm in raising awareness about our most recent innovations, in addition to our long-standing workflows, formats and tools developed with the aim to facilitate open and efficient access to scientific data; and their integration into published scholarly work, as well as receiving well-deserved recognition for their collection.
.
Pensoft as a science communicator
At our Project team, which is undoubtedly the fastest developing department at Pensoft, science communicators are working closely with technology and publishing teams to help consortia bring their scientific results closer to policy actors, decision-makers and the society at large.
Throughout 2024, the team, comprising 20 science communicators and project managers, has been working as part of 27 EU-funded project consortia, including nine that have only started this year (check out all partnering projects on the Pensoft website, ordered from most recently started to oldest). Apart from communicating key outcomes and activities during the duration of the projects, at many of the projects, our team has also been actively involved in their grant proposal drafting, coordination, administration, platform development, graphic and web design and others (see all project services offered by Pensoft to consortia).
Naturally, we had a seat on the front row during many milestones achieved by our partners at all those 27 ongoing projects, and communicated to the public by our communicators.
Amongst those are the release of the InsectsCount web application developed within the Horizon 2020 project SHOWCASE. Through innovative gamification elements, the app encourages users to share valuable data about flower-visiting insects, which in turn help researchers gain new knowledge about the relationship between observed species and the region’s land use and management practices (learn more about InsectsCount on the SHOWCASE prroject website).
Another fantastic project output was the long-awaited dataset of maps of annual forest disturbances across 38 European countries derived from the Landsat satellite data archive published by the Horizon Europe project ForestPaths in April (find more about the European Forest Disturbance Atlas on the ForestPaths project website).
In a major company highlight, last month, our project team participated in COP29 in Baku, Azerbaijan with a side event dedicated to the role of open science and science communication in climate- and biodiversity-friendly policy.
Pensoft’s participation at COP29 – as well as our perspective on FAIR data and open science – were recently covered in an interview by Exposed by CMD (a US-based news media accredited to cover the event) with our science communicator Alexandra Korcheva and project manager Boris Barov.
***
Now, to keep up with our next steps in real time, we invite you to follow Pensoft on social media on BlueSky,X,Facebook,InstagramandLinkedin!
Don’t forget to also enter your email to the right to sign up for new content from this blog!
The publications so far include the grant proposal; conference abstracts, a workshop report, guidelines papers and deliverables submitted to the Commission.
The dynamic open-science project collection of BiCIKL, titled “Towards interlinked FAIR biodiversity knowledge: The BiCIKL perspective” (doi: 10.3897/rio.coll.105), continues to grow, as the project progresses into its third year and its results accumulate ever so exponentially.
Following the publication of three important BiCIKL deliverables: the project’s Data Management Plan, its Visual identity package and a report, describing the newly built workflow and tools for data extraction, conversion and indexing and the user applications from OpenBiodiv, there are currently 30 research outcomes in the BiCIKL collection that have been shared publicly to the world, rather than merely submitted to the European Commission.
Shortly after the BiCIKL project started in 2021, a project-branded collection was launched in the open-science scholarly journal Research Ideas and Outcomes(RIO). There, the partners have been publishing – and thus preserving – conclusive research papers, as well as early and interim scientific outputs.
The publications so far also include the BiCIKL grant proposal, which earned the support of the European Commission in 2021; conference abstracts, submitted by the partners to two consecutive TDWG conferences; a project report that summarises recommendations on interoperability among infrastructures, as concluded from a hackathon organised by BiCIKL; and two Guidelines papers, aiming to trigger a culture change in the way data is shared, used and reused in the biodiversity field.
At the time of writing, the top three of the most read papers in the BiCIKL collection is completed by the grant proposal and the second Guidelines paper, where the partners – based on their extensive and versatile experience – present recommendations about the use of annotations and persistent identifiers in taxonomy and biodiversity publishing.
What one might find quite odd when browsing the BiCIKL collection is that each publication is marked with its own publication source, even though all contributions are clearly already accessible from RIO Journal.
This is because one of the unique features of RIOallows for consortia to use their project collection as a one-stop access point for all scientific results, regardless of their publication venue, by means of linking to the original source via metadata. Additionally, projects may also upload their documents in their original format and layout, thanks to the integration between RIO and ARPHA Preprints. This is in fact how BiCIKL chose to share their latest deliverables using the very same files they submitted to the Commission.
“In line with the mission of BiCIKL and our consortium’s dedication to FAIRness in science, we wanted to keep our project’s progress and results fully transparent and easily accessible and reusable to anyone, anywhere,”
explains Prof Lyubomir Penev, BiCIKL’s Project Coordinator and founder and CEO of Pensoft.
“This is why we opted to collate the outcomes of BiCIKL in one place – starting from the grant proposal itself, and then progressively adding workshop reports, recommendations, research papers and what not. By the time BiCIKL concludes, not only will we be ready to refer back to any step along the way that we have just walked together, but also rest assured that what we have achieved and learnt remains at the fingertips of those we have done it for and those who come after them,” he adds.
EIVE 1.0 is the most comprehensive system of ecological indicator values of vascular plants in Europe to date. It can be used as an important tool for continental-scale analyses of vegetation and floristic data.
It took seven years and hundreds of hours of work by an international team of 34 authors to develop and publish the most comprehensive system of ecological indicator values (EIVs) of vascular plants in Europe to date.
EIVE 1.0 provides the five most-used ecological indicators, M – moisture, N – nitrogen, R – reaction, L – light and T – temperature, for a total of 14,835 vascular plant taxa in Europe, or between 13,748 and 14,714 for the individual indicators. For each of these taxa, EIVE contains three values: the EIVE niche position indicator, the EIVE niche width indicator and the number of regional EIV systems on which the assessment was based. Both niche position and niche width are given on a continuous scale from 0 to 10, not as categorical ordinal values as in the source systems.
Evidently, EIVE can be an important tool for continental-scale analyses of vegetation and floristic data in Europe.
It will allow to analyse the nearly 2 million vegetation plots currently contained in the European Vegetation Archive (EVA; Chytrý et al. 2016) in new ways.
Since EVA apart from elevation, slope inclination and aspect hardly contains any in situ measured environmental variables, the numerous macroecological studies up to date had to rely on coarse modelled environmental data (e.g. climate) instead. This is particularly problematic for soil variables such as pH, moisture or nutrients, which can change dramatically within a few metres.
Here, the approximation of site conditions by mean ecological indicator values can improve the predictive power substantially (Scherrer and Guisan 2019). Likewise, in broad-scale vegetation classification studies, mean EIVE values per plot would allow a better characterisation of the distinguished vegetation units. Lastly, one should not forget that most countries in Europe do not have a national EIV system, and here EIVE could fill the gap.
Almost on the same day as EIVE 1.0 another supranational system of ecological indicator values in Europe has been published by Tichý et al. (2023) with a similar approach.
Thus, it will be important for vegetation scientists in Europe to understand the pros and cons of both systems to allow the wise selection of the most appropriate tool:
EIVE 1.0 is based on 31 regional EIV systems, while Tichý et al. (2023) uses 12.
Both systems provide indicator values for moisture, nitrogen/nutrients, reaction, light and temperature, while Tichý et al. (2023) additionally has a salinity indicator.
Tichý et al. (2023) aimed at using the same scales as Ellenberg et al. (1991), which means that the scales vary between indicators (1–9, 0–9, 1–12), while EIVE has a uniform interval scale of 0–10 for all indicators.
Only EIVE provides niche width in addition to niche position. Niche width is an important aspect of the niche and might be used to improve the calculation of mean indicator values per plot (e.g. by weighting with inverse niche width).
The taxonomic coverage is larger in EIVE than in Tichý et al. (2023): 14,835 vs. 8,908 accepted taxa and 11,148 vs. 8,679 species.
EIVE provides indicator values for accepted subspecies, while Tichý et al. (2023) is restricted to species and aggregates. Separate indicator values for subspecies might be important for two reasons: (a) subspecies often strongly differ in at least one niche dimension; (b) many of the taxa now considered as subspecies have been treated at species level in the regional EIV systems.
Tichý et al. (2023) added 431 species not contained in any of the source systems based on vegetation-plot data from the European Vegetation Archive (EVA; Chytrý et al. 2016) while EIVE calculated the European indicator values only for taxa occurring at least in one source system.
While both systems present maps that suggest a good coverage across Europe, Tichý et al. (2023)’s source systems largely were from Central Europe, NW Europe and Italy, but, unlike EIVE, these authors did not use source systems from the more “distal” parts of Europe, such as Sweden, Faroe Islands, Russia, Georgia, Romania, Poland and Spain, and they used only a small subset of indicators of the EIV systems of Ukraine, Greece and the Alps.
In a validation with GBIF-derived data on temperature niches, Dengler et al. (2023) showed that EIVE has a slightly stronger correlation than Tichý et al. (2023)’s indicators (r = 0.886 vs. 0.852).
How did EIVE manage to integrate all EIV systems in Europe that contained at least one of the selected indicators for vascular plants, while Tichý et al. (2023) used only a small subset?
This difference is mainly due to a more complex workflow in EIVE (which also was one of the reasons why the preparation took so long). First, Tichý et al. (2023) restricted their search to EIV systems and indicators that had the same number of categories as the “original” Ellenberg system.
Second, from these they discarded those that showed a too low correlation with Ellenberg. By contrast, EIVE’s workflow allowed the use of any system with an ordinal (or even metric) scale, irrespective of the number of categories or the initial match with Ellenberg et al. (1991).
EIVE also did not treat one system (Ellenberg) as the master to assess all others but considered each of them equally valid. While indeed the individual EIV systems are often quite inconsistent, i.e. even if they refer to Ellenberg, the same value of an indicator in one system might mean something different in another system, our iterative linear optimisation enabled us to adjust all 31 systems for the five indicators to a common basis.
This in turn allowed deriving EIVE as the consensus system of all the source systems. The fact that in our validation of the temperature indicator, EIVE performed better than Tichý et al. (2023) and much better than most of the regional EIV systems might be attributable to the so-called “wisdom of the crowd”, going back to the statistician Francis Galton who found that averaging numerous independent assessments (even by laymen) of a continuous quantity can leads to very good estimates of the true value.
Apart from the indicator values themselves, EIVE has a second main feature that might not be so obvious at first glance, but which actually took the EIVE team, including several taxonomists, more time than the workflow to generate the indicator values themselves: the taxonomic backbone. EIVE for vascular plants is fully based on the taxonomic concept (including the synonymic relationships) of the Euro+Med Plantbase.
However, since Euro+Med lacks an important part of taxa that are frequently recorded in vegetation plots, to make our backbone fully usable to vegetation science, we expanded it beyond Euro+Med to something called “Euro+Med augmented”. We particularly added hybrids, neophytes and aggregates, three groups of plants hitherto only very marginally covered in Euro+Med. All additions were done by experts consistently with the taxonomic concept of Euro+Med and are fully documented. Likewise, many additional synonym relationships had to be added that were missing in Euro+Med.
Finally, we implemented the so-called “concept synonymy” (see Jansen and Dengler 2010), which allows the assignment of the same name from different sources to different accepted names (“taxonomic concepts”). This applies mainly to nested taxa that are treated at different levels in different sources, e.g. once as species with several subspecies, once as aggregate with several species. However, there are also some cases of misapplied names (i.e. names that were not used in agreement with their nomenclatural type in certain EIV systems). Such cases generally cannot be solved by the various tools for automatic taxonomic cleaning, but require experts who make a case-by-case decision.
The whole taxonomic workflow of EIVE is fully transparent with an R code that “digests”:
(a) the names as they are in the source systems,
(b) the official Euro+Med database and
(c) tables that document our additions and modifications (with reasons and references).
This comprehensive documentation will allow continuous and efficient improvement in the future, be it because of taxonomic novelties adopted in Euro+Med or because EIVE’s experts decide to change certain interpretations. That way, “Euro+Med augmented” and the accompanying R-based workflow can also be a valuable tool for other projects that wish to harmonise plant taxonomic information from various sources at a continental scale, e.g. in vegetation-plot databases such as GrassPlot (Dengler et al. 2018) and EVA (Chytrý et al. 2016).
The publication of EIVE 1.0 is not the endpoint, but rather a starting point for future developments in a community-based approach.
Together with interested colleagues from outside, the EIVE core team plans to prepare better and more comprehensive releases of EIVE in the future, including updates to its taxonomic backbone.
Future releases of EIVE will be published in fixed versions, typically together with a paper that describes the changes in the content.
As steps for the next two years, we anticipate that we will first add further taxa (bryophytes, lichens, macroalgae) and some additional indicators, both of which are relatively easy with our established R-based workflow. Then we plan EIVE 2.0 that will use the approx. 2 million vegetation plots in EVA (Chytrý et al. 2016) to re-calibrate EIVE for all taxa (see http://euroveg.org/requests/EVA-data-request-form-2022-02-10-Dengleretal.pdf).
***
This Behind the paper post refers to the article Ecological Indicator Values for Europe (EIVE) 1.0 by Jürgen Dengler, Florian Jansen, Olha Chusova, Elisabeth Hüllbusch, Michael P. Nobis, Koenraad Van Meerbeek, Irena Axmanová, Hans Henrik Bruun, Milan Chytrý, Riccardo Guarino, Gerhard Karrer, Karlien Moeys, Thomas Raus, Manuel J. Steinbauer, Lubomir Tichý, Torbjörn Tyler, Ketevan Batsatsashvili, Claudia Bita-Nicolae, Yakiv Didukh, Martin Diekmann, Thorsten Englisch, Eduardo Fernandez Pascual, Dieter Frank, Ulrich Graf, Michal Hájek, Sven D. Jelaska, Borja Jiménez-Alfaro, Philippe Julve, George Nakhutsrishvili, Wim A. Ozinga, Eszter-Karolina Ruprecht, Urban Šilc, Jean-Paul Theurillat, and François Gillet published in Vegetation Classification and Survey (https://doi.org/10.3897/VCS.98324).
***
Follow the Vegetation Classification and Survey journal on Facebook and Twitter.
***
Brief personal summaries:
Jürgen Dengler is a Professor of Vegetation Ecology at the Zurich University of Applied Science (ZHAW) in Wädenswil, Switzerland. Among others, he cofounded the European Vegetation Database (EVA), the global vegetation-plot database “sPlot” and the “GrassPlot” database of the Eurasian Dry Grassland Group. His major research interests are grassland ecology, grassland conservation, biodiversity patterns, macroecology, vegetation change, broad-scale vegetation classification, methodological developments in vegetation ecology and ecoinformatics.
Florian Jansen is a Professor of Landscape Ecology at the University of Rostock, Germany. His research interests are vegetation ecology and dynamics, mire ecology including greenhouse gas emissions, and numerical ecology with R. He (co-)founded the German Vegetation Database vegetweb.de, the European Vegetation Database (EVA), and the global vegetation-plot database “sPlot”. He wrote the R package eHOF for modelling species response curves along one-dimensional ecological gradients.
François Gillet is an Emeritus Professor of Community Ecology at the University of Franche-Comté in Besançon, France. His major research interests are vegetation diversity, ecology and dynamics, grassland and forest ecology, integrated synusial phytosociology, numerical ecology with R, dynamic modelling of social-ecological systems.
***
References:
Chytrý, M., Hennekens, S.M., Jiménez-Alfaro, B., Knollová, I., Dengler, J., Jansen, F., Landucci, F., Schaminée, J.H.J., Aćić, S., (…) & Yamalov, S. 2016. European Vegetation Archive (EVA): an integrated database of European vegetation plots. Applied Vegetation Science 19: 173–180.
Dengler J, Wagner V, Dembicz I, García-Mijangos I, Naqinezhad A, Boch S, Chiarucci A, Conradi T, Filibeck G, … Biurrun I (2018) GrassPlot – a database of multi-scale plant diversity in Palaearctic grasslands. Phytocoenologia 48: 331–347.
Dengler, J., Jansen, F., Chusova, O., Hüllbusch, E., Nobis, M.P., Van Meerbeek, K., Axmanová, I., Bruun, H.H., Chytrý, M., (…) & Gillet, F. 2023. Ecological Indicator Values for Europe (EIVE) 1.0. Vegetation Classification and Survey 4: 7–29.
Ellenberg H, Weber HE, Düll R, Wirth V, Werner W, Paulißen D (1991) Zeigerwerte von Pflanzen in Mitteleuropa. Scripta Geobotanica 18: 1–248.
Jansen F, Dengler J (2010) Plant names in vegetation databases – a neglected source of bias. Journal of Vegetation Science 21: 1179–1186.
Midolo, G., Herben, T., Axmanová, I., Marcenò, C., Pätsch, R., Bruelheide, H., Karger, D.N., Acic, S., Bergamini, A., Bergmeier, E., Biurrun, I., Bonari, G., Carni, A., Chiarucci. A., De Sanctis, M., Demina, O., (…), Dengler, J., (…) & Chytrý, M. 2023. Disturbance indicator values for European plants. Global Ecology and Biogeography 32: 24–34.
Scherrer D, Guisan A (2019) Ecological indicator values reveal missing predictors of species distributions. Scientific Reports 9: Article 3061.
Tichý, L, Axmanová, I., Dengler, J., Guarino, R., Jansen, F., Midolo, G., Nobis, M.P., Van Meerbeek, K., Aćić, S., (…) & Chytrý, M. 2023. Ellenberg-type indicator values for European vascular plant species. Journal of Vegetation Science 34: e13168.
The FAIR Data Place – the key and final product of the partnership – is meant to provide scientists with all types of biodiversity data “at their fingertips”
The Horizon 2020 – funded project BiCIKL has reached its halfway stage and the partners gathered in Plovdiv (Bulgaria) from the 22nd to the 25th of October for the Second General Assembly, organised by Pensoft.
The BiCIKL project will launch a new European community of key research infrastructures, researchers, citizen scientists and other stakeholders in the biodiversity and life sciences based on open science practices through access to data, tools and services.
BiCIKL’s goal is to create a centralised place to connect all key biodiversity data by interlinking 15 research infrastructures and their databases. The 3-year European Commission-supported initiative kicked off in 2021 and involves 14 key natural history institutions from 10 European countries.
BiCIKL is keeping pace as expected with 16 out of the 48 final deliverables already submitted, another 9 currently in progress/under review and due in a few days. Meanwhile, 21 out of the 48 milestones have been successfully achieved.
The hybrid format of the meeting enabled a wider range of participants, which resulted in robust discussions on the next steps of the project, such as the implementation of additional technical features of the FAIR Data Place (FAIR being an abbreviation for Findable, Accessible, Interoperable and Reusable).
This data includes biodiversity information, such as detailed images, DNA, physiology and past studies concerning a specific species and its ‘relatives’, to name a few. Currently, the issue is that all those types of biodiversity data have so far been scattered across various databases, which in turn have been missing meaningful and efficient interconnectedness.
Additionally, the FAIR Data Place, developed within the BiCIKL project, is to give researchers access to plenty of training modules to guide them through the different services.
Halfway through the duration of BiCIKL, the project is at a turning point, where crucial discussions between the partners are playing a central role in the refinement of the FAIR Data Place design. Most importantly, they are tasked with ensuring that their technologies work efficiently with each other, in order to seamlessly exchange, update and share the biodiversity data every one of them is collecting and taking care of.
By Year 3 of the BiCIKL project, the partners agree, when those infrastructures and databases become efficiently interconnected to each other, scientists studying the Earth’s biodiversity across the world will be in a much better position to build on existing research and improve the way and the pace at which nature is being explored and understood. At the end of the day, knowledge is the stepping stone for the preservation of biodiversity and humankind itself.
“Needless to say, it’s an honour and a pleasure to be the coordinator of such an amazing team spanning as many as 14 partnering natural history and biodiversity research institutions from across Europe, but also involving many global long-year collaborators and their infrastructures, such as Wikidata, GBIF, TDWG, Catalogue of Life to name a few,”
said BiCIKL’s project coordinator Prof. Lyubomir Penev, CEO and founder of Pensoft.
“The point is: do we want an integrated structure or do we prefer federated structures? What are the pros and cons of the two options? It’s essential to keep the community united and allied because we can’t afford any information loss and the stakeholders should feel at home with the Project and the Biodiversity Knowledge Hub.”
Joe Miller, Executive Secretary and Director at GBIF, commented:
“We are a brand new community, and we are in the middle of the growth process. We would like to already have answers, but it’s good to have this kind of robust discussion to build on a good basis. We must find the best solution to have linkages between infrastructures and be able to maintain them in the future because the Biodiversity Knowledge Hub is the location to gather the community around best practices, data and guidelines on how to use the BiCIKL services… In order to engage even more partners to fill the eventual gaps in our knowledge.”
“In an era of biodiversity change and loss, leveraging scientific data fully will allow the world to catalogue what we have now, to track and understand how things are changing and to build the tools that we will use to conserve or remediate. The challenge is that the data come from many streams – molecular biology, taxonomy, natural history collections, biodiversity observation – that need to be connected and intersected to allow scientists and others to ask real questions about the data. In its first year, BiCIKL has made some key advances to rise to this challenge,”
“As a partner, we, at the Biodiversity Information Standards – TDWG, are very enthusiastic that our standards are implemented in BiCIKL and serve to link biodiversity data. We know that joining forces and working together is crucial to building efficient infrastructures and sharing knowledge.”
The project will go on with the first Round Table of experts in December and the publications of the projects who participated in the Open Call and will be founded at the beginning of the next year.
***
Learn more about BiCIKL on the project’s website at: bicikl-project.eu
The purpose of this call is to solicit, select and implement four to six biodiversity data-related scientific projects that will make use of the added value services developed by the leading Research Infrastructures that make the BiCIKL project.
The BiCIKL project invites submissions of Expression of Interest (EoI) to the First BiCIKL Open Call for projects. The purpose of this call is to solicit, select and implement four to six biodiversity data-related scientific projects that will make use of the added value services developed by the leading Research Infrastructures that make the BiCIKL project.
By opening this call, BiCIKL aims to better understand how it could support scientific questions that arise from across the biodiversity world in the future, while addressing specific scientific or technical biodiversity data challenges presented by the applicants.
The BiCIKL project – a Horizon 2020-funded project involving 14 European institutions, representing major global players in biodiversity research and natural history, and coordinated by Pensoft – establishes a European starting community of key research infrastructures, researchers, citizen scientists and other biodiversity and life sciences stakeholders based on open science practices through access to data, tools and services.
Within Biodiversity Community Integrated Knowledge Library (BiCIKL), 14 key research and natural history institutions commit to link infrastructures and technologies to provide flawless access to biodiversity data.
In a recently started Horizon 2020-funded project, 14 European institutions from 10 countries, representing both the continent’s and global key players in biodiversity research and natural history, deploy and improve their own and partnering infrastructures to bridge gaps between each other’s biodiversity data types and classes. By linking their technologies, they are set to provide flawless access to data across all stages of the research cycle.
Three years in, BiCIKL (abbreviation for Biodiversity Community Integrated Knowledge Library) will have created the first-of-its-kind Biodiversity Knowledge Hub, where a researcher will be able to retrieve a full set of linked and open biodiversity data, thereby accessing the complete story behind an organism of interest: its name, genetics, occurrences, natural history, as well as authors and publications mentioning any of those.
Ultimately, the project’s products will solidify Open Science and FAIR (Findable, Accessible, Interoperable and Reusable) data practices by empowering and streamlining biodiversity research.
Together, the project partners will redesign the way biodiversity data is found, linked, integrated and re-used across the research cycle. By the end of the project, BiCIKL will provide the community with a more transparent, trustworthy and efficient highly automated research ecosystem, allowing for scientists to access, explore and put into further use a wide range of data with only a few clicks.
Continuously fed with data sourced by the partnering institutions and their infrastructures, BiCIKL’s key final output: the Biodiversity Knowledge Hub, is set to persist with time long after the project has concluded. On the contrary, by accelerating biodiversity research that builds on – rather than duplicates – existing knowledge, it will in fact be providing access to exponentially growing contextualised biodiversity data.
***
Learn more about BiCIKL on the project’s website at: bicikl-project.eu
From 1973 to 2020, Australian zoologist Dr Robert Mesibov kept careful records of the “where” and “when” of his plant and invertebrate collecting trips. Now, he has made those valuable biodiversity data freely and easily accessible via the Zenodo open-data repository, so that future researchers can rely on this “authority file” when using museum specimens collected from those events in their own studies. The new dataset is described in the open-access, peer-reviewed Biodiversity Data Journal.
While checking museum records, Dr Robert Mesibov found there were occasional errors in the dates and places for specimens he had collected many years before. He was not surprised.
One solution to this problem was what librarians and others have long called an “authority file”.
“It’s an authoritative reference, in this case with the correct details of where I collected and when”, he explained.
“I kept records of almost all my collecting trips from 1973 until I retired from field work in 2020. The earliest records were on paper, but I began storing the key details in digital form in the 1990s.”
The 48-year record has now been made publicly available via the Zenodo open-data repository after conversion to the Darwin Core data format, which is widely used for sharing biodiversity information. With this “authority file”, described in detail in the open-access, peer-reviewed Biodiversity Data Journal, future researchers will be able to rely on sound, interoperable and easy to access data, when using those museum specimens in their own studies, instead of repeating and further spreading unintentional errors.
“There are 3829 collecting events in the authority file”, said Mesibov, “from six Australian states and territories. For each collecting event there are geospatial and date details, plus notes on the collection.”
Mesibov hopes the authority file will be used by museums to correct errors in their catalogues.
“It should also save museums a fair bit of work in future”, he explained. “No need to transcribe details on specimen labels into digital form in a database, because the details are already in digital form in the authority file.”
Mesibov points out that in the 19th and 20th centuries, lists of collecting events were often included in the reports of major scientific expeditions.
“Those lists were authority files, but in the pre-digital days it was probably just as easy to copy collection data from specimen labels.”
“Authority files for collecting events are the next logical step,” said Mesibov. “They can be used as lookup tables for all the important details of individual collections: where, when, by whom and how.”
###
Research paper:
Mesibov RE (2021) An Australian collector’s authority file, 1973–2020. Biodiversity Data Journal 9: e70463. https://doi.org/10.3897/BDJ.9.e70463
Between now and 15 September 2021, the article processing fee (normally €550) will be waived for the first 36 papers, provided that the publications are accepted and meet the following criteria that the data paper describes a dataset:
The manuscript must be prepared in English and is submitted in accordance with BDJ’s instructions to authors by 15 September 2021. Late submissions will not be eligible for APC waivers.
Sponsorship is limited to the first 36 accepted submissions meeting these criteria on a first-come, first-served basis. The call for submissions can therefore close prior to the stated deadline of 15 September 2021. Authors may contribute to more than one manuscript, but artificial division of the logically uniform data and data stories, or “salami publishing”, is not allowed.
BDJ will publish a special issue including the selected papers by the end of 2021. The journal is indexed by Web of Science (Impact Factor 1.331), Scopus (CiteScore: 2.1) and listed in РИНЦ / eLibrary.ru.
For non-native speakers, please ensure that your English is checked either by native speakers or by professional English-language editors prior to submission. You may credit these individuals as a “Contributor” through the AWT interface. Contributors are not listed as co-authors but can help you improve your manuscripts.
In addition to the BDJ instruction to authors, it is required that datasets referenced from the data paper a) cite the dataset’s DOI, b) appear in the paper’s list of references, and c) has “Russia 2021” in Project Data: Title and “N-Eurasia-Russia2021“ in Project Data: Identifier in the dataset’s metadata.
Questions may be directed either to Dmitry Schigel, GBIF scientific officer, or Yasen Mutafchiev, managing editor of Biodiversity Data Journal.
The 2021 extension of the collection of data papers will be edited by Vladimir Blagoderov, Pedro Cardoso, Ivan Chadin, Nina Filippova, Alexander Sennikov, Alexey Seregin, and Dmitry Schigel.
Datasets with more than 5,000 records that are new to GBIF.org
Datasets should contain at a minimum 5,000 new records that are new to GBIF.org. While the focus is on additional records for the region, records already published in GBIF may meet the criteria of ‘new’ if they are substantially improved, particularly through the addition of georeferenced locations.” Artificial reduction of records from otherwise uniform datasets to the necessary minimum (“salami publishing”) is discouraged and may result in rejection of the manuscript. New submissions describing updates of datasets, already presented in earlier published data papers will not be sponsored.
Justification for publishing datasets with fewer records (e.g. sampling-event datasets, sequence-based data, checklists with endemics etc.) will be considered on a case-by-case basis.
Datasets with high-quality data and metadata
Authors should start by publishing a dataset comprised of data and metadata that meets GBIF’s stated data quality requirement. This effort will involve work on an installation of the GBIF Integrated Publishing Toolkit.
Only when the dataset is prepared should authors then turn to working on the manuscript text. The extended metadata you enter in the IPT while describing your dataset can be converted into manuscript with a single-click of a button in the ARPHA Writing Tool (see also Creation and Publication of Data Papers from Ecological Metadata Language (EML) Metadata. Authors can then complete, edit and submit manuscripts to BDJ for review.
Datasets with geographic coverage in Russia
In correspondence with the funding priorities of this programme, at least 80% of the records in a dataset should have coordinates that fall within the priority area of Russia. However, authors of the paper may be affiliated with institutions anywhere in the world.
***
Check out the Biota of Russia dynamic data paper collection so far.
Follow Biodiversity Data Journal on Twitter and Facebook to keep yourself posted about the new research published.
by Mariya Dimitrova, Jorrit Poelen, Georgi Zhelezov, Teodor Georgiev, Lyubomir Penev
Tables published in scholarly literature are a rich source of primary biodiversity data. They are often used for communicating species occurrence data, morphological characteristics of specimens, links of species or specimens to particular genes, ecology data and biotic interactions between species, etc. Tables provide a structured format for sharing numerous facts about biodiversity in a concise and clear way.
Inspired by the potential use of semantically-enhanced tables for text and data mining, Pensoft and Global Biotic Interactions (GloBI) developed a workflow for extracting and indexing biotic interactions from tables published in scholarly literature. GloBI is an open infrastructure enabling the discovery and sharing of species interaction data. GloBI ingests and accumulates individual datasets containing biotic interactions and standardises them by mapping them to community-accepted ontologies, vocabularies and taxonomies. Data integrated by GloBI is accessible through an application programming interface (API) and as archives in different formats (e.g. n-quads). GloBI has indexed millions of species interactions from hundreds of existing datasets spanning over a hundred thousand taxa.
The workflow
First, all tables extracted from Pensoft publications and stored in the OpenBiodiv triple store were automatically retrieved (Step 1 in Fig. 1). There were 6993 tables from 21 different journals. To identify only the tables containing biotic interactions, we used an ontology annotator, currently developed by Pensoft using terms from the OBO Relation Ontology (RO). The Pensoft Annotator analyses free text and finds words and phrases matching ontology term labels.
We used the RO to create a custom ontology, or list of terms, describing different biotic interactions (e.g. ‘host of’, ‘parasite of’, ‘pollinates’) (Step 2 in Fig. 1).. We used all subproperties of the RO term labeled ‘biotically interacts with’ and expanded the list of terms with additional word spellings and variations (e.g. ‘hostof’, ‘host’) which were added to the custom ontology as synonyms of already existing terms using the property oboInOwl:hasExactSynonym.
This custom ontology was used to perform annotation of all tables via the Pensoft Annotator (Step 3 in Fig. 1). Tables were split into rows and columns and accompanying table metadata (captions). Each of these elements was then processed through the Pensoft Annotator and if a match from the custom ontology was found, the resulting annotation was written to a MongoDB database, together with the article metadata. The original table in XML format, containing marked-up taxa, was also stored in the records.
Thus, we detected 233 tables which contain biotic interactions, constituting about 3.4% of all examined tables. The scripts used for parsing the tables and annotating them, together with the custom ontology, are open source and available on GitHub. The database records were exported as json to a GitHub repository, from where they could be accessed by GloBI.
GloBI processed the tables further, involving the generation of a table citation from the article metadata and the extraction of interactions between species from the table rows (Step 4 in Fig. 1). Table citations were generated by querying the OpenBiodiv database with the DOI of the article containing each table to obtain the author list, article title, journal name and publication year. The extraction of table contents was not a straightforward process because tables do not follow a single schema and can contain both merged rows and columns (signified using the ‘rowspan’ and ‘colspan’ attributes in the XML). GloBI were able to index such tables by duplicating rows and columns where needed to be able to extract the biotic interactions within them. Taxonomic name markup allowed GloBI to identify the taxonomic names of species participating in the interactions. However, the underlying interaction could not be established for each table without introducing false positives due to the complicated table structures which do not specify the directionality of the interaction. Hence, for now, interactions are only of the type ‘biotically interacts with’ (Fig. 2) because it is a bi-directional one (e.g. ‘Species A interacts with Species B’ is equivalent to ‘Species B interacts with Species A’).
Examples of species interactions provided by OpenBiodiv and indexed by GloBI are available on GloBI’s website.
In the future we plan to expand the capacity of the workflow to recognise interaction types in more detail. This could be implemented by applying part of speech tagging to establish the subject and object of an interaction.
In addition to being accessible via an API and as archives, biotic interactions indexed by GloBI are available as Linked Open Data and can be accessed via a SPARQL endpoint. Hence, we plan on creating a user-friendly service for federated querying of GloBI and OpenBiodiv biodiversity data.
This collaborative project is an example of the benefits of open and FAIR data, enabling the enhancement of biodiversity data through the integration between Pensoft and GloBI. Transformation of knowledge contained in existing scholarly works into giant, searchable knowledge graphs increases the visibility and attributed re-use of scientific publications.
Tables published in scholarly literature are a rich source of primary biodiversity data. They are often used for communicating species occurrence data, morphological characteristics of specimens, links of species or specimens to particular genes, ecology data and biotic interactions between species etc. Tables provide a structured format for sharing numerous facts about biodiversity in a concise and clear way.
Inspired by the potential use of semantically-enhanced tables for text and data mining, Pensoft and Global Biotic Interactions (GloBI) developed a workflow for extracting and indexing biotic interactions from tables published in scholarly literature. GloBI is an open infrastructure enabling the discovery and sharing of species interaction data. GloBI ingests and accumulates individual datasets containing biotic interactions and standardises them by mapping them to community-accepted ontologies, vocabularies and taxonomies. Data integrated by GloBI is accessible through an application programming interface (API) and as archives in different formats (e.g. n-quads). GloBI has indexed millions of species interactions from hundreds of existing datasets spanning over a hundred thousand taxa.
The workflow
First, all tables extracted from Pensoft publications and stored in the OpenBiodiv triple store were automatically retrieved (Step 1 in Fig. 1). There were 6,993 tables from 21 different journals. To identify only the tables containing biotic interactions, we used an ontology annotator, currently developed by Pensoft using terms from the OBO Relation Ontology (RO). The Pensoft Annotator analyses free text and finds words and phrases matching ontology term labels.
We used the RO to create a custom ontology, or list of terms, describing different biotic interactions (e.g. ‘host of’, ‘parasite of’, ‘pollinates’) (Step 1 in Fig. 1). We used all subproperties of the RO term labeled ‘biotically interacts with’ and expanded the list of terms with additional word spellings and variations (e.g. ‘hostof’, ‘host’) which were added to the custom ontology as synonyms of already existing terms using the property oboInOwl:hasExactSynonym.
This custom ontology was used to perform annotation of all tables via the Pensoft Annotator (Step 3 in Fig. 1). Tables were split into rows and columns and accompanying table metadata (captions). Each of these elements was then processed through the Pensoft Annotator and if a match from the custom ontology was found, the resulting annotation was written to a MongoDB database, together with the article metadata. The original table in XML format, containing marked-up taxa, was also stored in the records.
Thus, we detected 233 tables which contain biotic interactions, constituting about 3.4% of all examined tables. The scripts used for parsing the tables and annotating them, together with the custom ontology, are open source and available on GitHub. The database records were exported as JSON to a GitHub repository, from where they could be accessed by GloBI.
GloBI processed the tables further, involving the generation of a table citation from the article metadata and the extraction of interactions between species from the table rows (Step 4 in Fig. 1). Table citations were generated by querying the OpenBiodiv database with the DOI of the article containing each table to obtain the author list, article title, journal name and publication year. The extraction of table contents was not a straightforward process because tables do not follow a single schema and can contain both merged rows and columns (signified using the ‘rowspan’ and ‘colspan’ attributes in the XML). GloBI were able to index such tables by duplicating rows and columns where needed to be able to extract the biotic interactions within them. Taxonomic name markup allowed GloBI to identify the taxonomic names of species participating in the interactions. However, the underlying interaction could not be established for each table without introducing false positives due to the complicated table structures which do not specify the directionality of the interaction. Hence, for now, interactions are only of the type ‘biotically interacts with’ because it is a bi-directional one (e.g. ‘Species A interacts with Species B’ is equivalent to ‘Species B interacts with Species A’).
In the future, we plan to expand the capacity of the workflow to recognise interaction types in more detail. This could be implemented by applying part of speech tagging to establish the subject and object of an interaction.
In addition to being accessible via an API and as archives, biotic interactions indexed by GloBI are available as Linked Open Data and can be accessed via a SPARQL endpoint. Hence, we plan on creating a user-friendly service for federated querying of GloBI and OpenBiodiv biodiversity data.
This collaborative project is an example of the benefits of open and FAIR data, enabling the enhancement of biodiversity data through the integration between Pensoft and GloBI. Transformation of knowledge contained in existing scholarly works into giant, searchable knowledge graphs increases the visibility and attributed re-use of scientific publications.
References
Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.
Additional Information
The work has been partially supported by the International Training Network (ITN) IGNITE funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 764840.