Proofreading the text of scientific papers isn’t hard, although it can be tedious. Are all the words spelled correctly? Is all the punctuation correct and in the right place? Is the writing clear and concise, with correct grammar? Are all the cited references listed in the References section, and vice-versa? Are the figure and table citations correct?
Proofreading of text is usually done first by the reviewers, and then finished by the editors and copy editors employed by scientific publishers. A similar kind of proofreading is also done with the small tables of data found in scientific papers, mainly by reviewers familiar with the management and analysis of the data concerned.
But what about proofreading the big volumes of data that are common in biodiversity informatics? Tables with tens or hundreds of thousands of rows and dozens of columns? Who does the proofreading?
Sadly, the answer is usually “No one”. Proofreading large amounts of data isn’t easy and requires special skills and digital tools. The people who compile biodiversity data often lack the skills, the software or the time to properly check what they’ve compiled.
The result is that a great deal of the data made available through biodiversity projects like GBIF is — to be charitable — “messy”. Biodiversity data often needs a lot of patient cleaning by end-users before it’s ready for analysis. To assist end-users, GBIF and other aggregators attach “flags” to each record in the database where an automated check has found a problem. These checks find the most obvious problems amongst the many possible data compilation errors. End-users often have much more work to do after the flags have been dealt with.
In 2017, Pensoft employed a data specialist to proofread the online datasets that are referenced in manuscripts submitted to Pensoft’s journals as data papers. The results of the data-checking are sent to the data paper’s authors, who then edit the datasets. This process has substantially improved many datasets (including those already made available through GBIF) and made them more suitable for digital re-use. At blog publication time, more than 200 datasets have been checked in this way.
Note that a Pensoft data audit does not check the accuracy of the data, for example, whether the authority for a species name is correct, or whether the latitude/longitude for a collecting locality agrees with the verbal description of that locality. For a more or less complete list of what does get checked, see the Data checklist at the bottom of this blog post. These checks are aimed at ensuring that datasets are correctly organised, consistently formatted and easy to move from one digital application to another. The next reader of a digital dataset is likely to be a computer program, not a human. It is essential that the data are structured and formatted, so that they are easily processed by that program and by other programs in the pipeline between the data compiler and the next human user of the data.
Pensoft’s data-checking workflow was previously offered only to authors of data paper manuscripts. It is now available to data compilers generally, with three levels of service:
Basic: the compiler gets a detailed report on what needs fixing
Standard: minor problems are fixed in the dataset and reported
Premium: all detected problems are fixed in collaboration with the data compiler and a report is provided
Because datasets vary so much in size and content, it is not possible to set a price in advance for basic, standard and premium data-checking. To get a quote for a dataset, send an email with a small sample of the data topublishing@pensoft.net.
—
Data checklist
Minor problems:
dataset not UTF-8 encoded
blank or broken records
characters other than letters, numbers, punctuation and plain whitespace
more than one version (the simplest or most correct one) for each character
unnecessary whitespace
Windows carriage returns (retained if required)
encoding errors (e.g. “Dum?ril” instead of “Duméril”)
missing data with a variety of representations (blank, “-“, “NA”, “?” etc)
Major problems:
unintended shifts of data items between fields
incorrect or inconsistent formatting of data items (e.g. dates)
different representations of the same data item (pseudo-duplication)
for Darwin Core datasets, incorrect use of Darwin Core fields
data items that are invalid or inappropriate for a field
data items that should be split between fields
data items referring to unexplained entities (e.g. “habitat is type A”)
truncated data items
disagreements between fields within a record
missing, but expected, data items
incorrectly associated data items (e.g. two country codes for the same country)
duplicate records, or partial duplicate records where not needed
For details of the methods used, see the author’s online resources:
“We thought that it was a good idea to remember this extraordinary year through the name of one remarkable species of Darwin wasp found in seven Mexican States (including Tamaulipas, where the UAT campus is located) and also Guatemala,” comment the researchers who discovered the previously unknown species.
Scientists at the Autonomous University of Tamaulipas (UAT) in Mexico recently discovered five new species of parasitoid wasps in Mexico, but the name of one of them sounds a bit weird: covida. Why this name?
In fact, the reason is quite simple. The thing is that the team of Andrey Khalaim (also a researcher at the Zoological Institute of Russian Academy of Sciences in Saint Petersburg, Russia) and Enrique Ruíz Cancino discovered the new to science species during the 2020 global quarantine period, imposed due to the COVID-19 pandemic. Their findings are described in a newly published research article, in the peer-reviewed, open-access scientific journal ZooKeys.
“We thought that it was a good idea to remember this extraordinary year through the name of one remarkable species of Darwin wasp found in seven Mexican States (including Tamaulipas, where the UAT campus is located) and also Guatemala,”
explain the scientists.
The new species, which goes by the official scientific name Stethantyx covida, belongs to the Darwin wasp family Ichneumonidae, one of the most species-rich insect families, which comprises more than 25,000 species worldwide.
“Darwin wasps are abundant and well-known almost everywhere in the world because of their beauty, gracility, and because they are used in biological control of insect pests in orchards and forests. Many Darwin wasp species attack the larvae or pupae of butterflies and moths. Yet, some species are particularly interesting, as their larvae feed on spider eggs and others, even more bizarre, develop on living spiders!”
further explain the authors of the new study.
Stethantyx covida is a small wasp that measures merely 3.5 mm in length. It is predominantly dark in colour, whereas parts of its body and legs are yellow or brown. It is highly polished and shining, and the ovipositor of the female is very long and slender.Along with Stethantyx covida, the authors also described four other Mexican species of Darwin wasps from three different genera (Stethantyx, Meggoleus, Phradis), all belonging to the subfamily Tersilochinae. Some tersilochines are common on flowers in springtime. While the majority of them are parasitoids of larvae of various beetles, some Mexican species attack sawflies, inhabiting the forests.
***
Original source:
Khalaim AI, Ruíz-Cancino E (2020) Contribution to the taxonomy of Mexican Tersilochinae (Hymenoptera, Ichneumonidae), with descriptions of five new species. ZooKeys 974: 1-21. https://doi.org/10.3897/zookeys.974.54536
Our observations on the quite small-bodied Asian kukri snakes in Thailand have documented a feeding behaviour which differs from anything ever described in snakes.
Normally, snakes would swallow their prey whole. However, this particular species: the Small-banded Kukri Snake (Oligodon fasciolatus), would instead use its enlarged posterior maxillary teeth to cut open the abdomen of large poisonous toads, then inserts its entire head into the cavity to pull out and eat the organs one by one, while the prey is still alive!
During those macabre attacks, we managed to capture on camera three times, the toads struggled vigorously to escape and avoid being eviscerated alive, but, on all occasions, this was in vain. The fights we saw lasted for up to a few hours, depending on the organs the snake would pull out first.
The toads observed belong to the quite common species called Asian Black-spotted Toad (Duttaphrynus melanostictus), which is known to secrete a potent toxin from their prominent parotid glands, located on the neck and all over the back. Could it be that the snakes have adopted this sophisticated and gory approach to avoid being poisoned?
In a fourth, and equally important, case, an adult kukri snake attacked a somewhat smaller individual of the same toad species. However, this time, the snake swallowed the entire toad. Why did the snake swallow the juvenile toad, we still don’t know. Perhaps smaller toads are less toxic than adults? Or, could it be that kukri snakes are indeed resistant to the Asian Black-spotted toad’s poison, yet the large size of the adult toads prevented the snakes from swallowing them in the three afore-mentioned cases?
At present, we cannot answer any of these questions, but we will continue to observe and report on these fascinating snakes in the hope that we will uncover further interesting aspects of their biology.
Perhaps you’d be pleased to know that kukri snakes are, thankfully, harmless to humans. However, I wouldn’t recommend being bitten by one of those. The thing is that they can inflict large wounds that bleed for hours, because of the anticoagulant agent these snakes inject into the victim’s bloodstream. Their teeth are designed to inflict lacerations rather than punctures, so your finger would feel as if cut apart! This secretion, produced by two glands, called Duvernoy’s glands and located behind the eyes of the snakes, are likely beneficial while the snakes spend hours extracting toad organs.
***
Publication:
Bringsøe H, Suthanthangjai M, Suthanthangjai W, Nimnuam K (2020) Eviscerated alive: Novel and macabre feeding strategy in Oligodon fasciolatus (Günther, 1864) eating organs of Duttaphrynus melanostictus (Schneider, 1799) in Thailand. Herpetozoa 33: 157-163. https://doi.org/10.3897/herpetozoa.33.e57096
An over a century-long mystery has been surrounding the Taiwanese butterfly fauna ever since the “father of zoogeography” Alfred Russel Wallace described a new species of butterfly: Lycaena nisa, whose identity was only re-examined in a recent project looking into the butterflies of Taiwan. Based on the original specimens, in addition to newly collected ones, Dr Yu-Feng Hsu of the National Taiwan Normal University resurrected the species name and added two new synonyms to it.
Described by the “father of zoogeography” and co-author of the theories of evolution and natural selection, the species hasn’t been reexamined since 1866
An over a century-long mystery has been surrounding the Taiwanese butterfly fauna ever since the “father of zoogeography” Alfred Russel Wallace, in collaboration with Frederic Moore, authored a landmark paper in 1866: the first to study the lepidopterans of the island.
Back then, in their study, Moore dealt with the moths portion and Wallace investigated the butterflies. Together, they reported 139 species, comprising 93 nocturnal 46 diurnal species, respectively. Of the latter, five species were described as new to science. Even though the correct placements of four out of those five butterflies in question have been verified a number of times since 1886, one of those butterflies: Lycaena nisa, would never be re-examined until very recently.
In a modern-day research project on Taiwanese butterflies, scientists retrieved the original type specimen from the Wallace collection at The History Museum of London, UK. Having also examined historical specimens housed at the Taiwan Agricultural Research Institute, in addition to newly collected butterflies from Australia and Hong Kong, Dr Yu-Feng Hsu of the National Taiwan Normal University finally resolved the identity of the mysterious Alfred Wallace’s butterfly: it is now going by the name Famegana nisa (comb. nov.), while two other species names (Lycaena alsulus and Zizeeria alsulus eggletoni) were proven to have been coined for the same butterfly after the original description by Wallace. Thereby, the latter two are both synonymised with Famegana nisa.
Despite having made entomologists scratch their heads for over a century, in the wild, the Wallace’s butterfly is good at standing out. As long as one knows what else lives in the open grassy habitats around, of course. Commonly known as ‘Grass Blue’, ‘Small Grass Blue’ or ‘Black-spotted Grass Blue’, the butterfly can be easily distinguished amongst the other local species by its uniformly grayish white undersides of the wings, combined with obscure submarginal bands and a single prominent black spot on the hindwing.
However, the species demonstrates high seasonal variability, meaning that individuals reared in the dry season have a reduced black spot, darker ground colour on wing undersides and more distinct submarginal bands in comparison to specimens from the wet season. This is why Dr Yu-Feng Hsu notes that it’s perhaps unnecessary to split the species into subspecies even though there have been up to four already recognised.
***
Alfred Russel Wallace, a British naturalist, explorer, geographer, anthropologist, biologist and illustrator, was a contemporary of Charles Darwin, and also worked on the debates within evolutionary theory, including natural selection. He also authored the famed book Darwinism in 1889, which explained and defended natural selection.
While Darwin and Wallace did exchange ideas, often challenging each other’s conclusions, they worked out the idea of natural selection each on their own. In his part, Wallace insisted that there was indeed a strong reason why a certain species would evolve. Unlike Darwin, Wallace argued that rather than a random natural process, evolution was occurring to maintain a species’ fitness to the specificity of its environment. Wallace was also one of the first prominent scientists to voice concerns about the environmental impact of human activity.
***
Original source:
Hsu Y-F (2020) The identity of Alfred Wallace’s mysterious butterfly taxon Lycaena nisa solved: Famegana nisa comb. nov., a senior synonym of F. alsulus (Lepidoptera, Lycaenidae, Polyommatinae). ZooKeys 966: 153-162. https://doi.org/10.3897/zookeys.966.51921
Contact:
Dr Yu-Feng Hsu, National Taiwan Normal University Email: t43018@ntnu.edu.tw
An isolated population of the rarest Palaearctic butterfly species: the Arctic Apollo (Parnassius arcticus), turned out to be a new to science subspecies with distinct looks as well as DNA. Named Parnassius arcticus arbugaevi, the butterfly is described in a recent paper, published in the peer-reviewed, open-access scientific journal Acta Biologica Sibirica.
“Thanks to the field studies of our colleague and friend Yuri Bakhaev, we obtained unique butterfly specimens from the Momsky Range in North-Eastern Yakutia. This mountain range, which is about 500 km long, has until now been a real ‘blank spot’ in terms of biodiversity research,”
“With the kind permission of Mikhail Ivanov, Director of the Momsky National Park, entomological collections were carried out in various parts of the park. Hard-to-reach areas were visited with the help of inspector Innokenty Fedorov,”
he adds.
Then, amongst the specimens, the scientists spotted butterflies that at first they thought to be the rarest species for the entire Palaearctic: the Arctic Apollo, a species endemic to Russia and North-Eastern Yakutia, which had only been known from the Suntar-Khayata and Verkhoyansk mountains.
Later, however, the team noticed that the curious specimens were larger on average, had more elongated wings compared to the Arctic Apollo, and were also missing the distinct dark spot on the wings. At that moment, they thought they were rather looking at a species currently unknown to science, and belonging to the Parnassius tenedius species group.
Eventually, following in-depth morphological and molecular genetic analyses, the scientists concluded that the population from the Momsky Range was in fact a new subspecies of the Arctic Apollo and can be distinguished by a number of external and DNA differences. They named the new subspecies Parnassius arcticus arbugaevi after German Arbugaev, Director of the ecological-ethnographic complex Chochur Muran, who provided comprehensive assistance to one of the co-authors of the study, Yu.I. Bakhaev, in his research in Yakutia.
The new subspecies inhabits dry scree slopes with poor vegetation at an elevation of 1,400 m. So far, it is only known from the type locality, Momsky Range, North-Eastern Yakutia, where butterflies can be seen from early June to July. The wingspan in males range between 39 and 45 mm.
“Thus, we obtained significant new data on the distribution and taxonomy of one of the rarest butterflies in the North Palaearctic,”
say the researchers in conclusion.
Original source:
Yakovlev RV, Shapoval NA, Bakhaev YI, Kuftina GN, Khramov BA (2020) A new subspecies of Parnassius arcticus (Eisner, 1968) (Lepidoptera, Papilionidae) from the Momsky Range (Yakutia, Russia). Acta Biologica Sibirica 6: 93-105. https://doi.org/10.3897/abs.6.e55925
After the revision of available type specimens from all available collections in the Russian museums and the Senckenberg Museum in Frankfurt-on-Main, as well as newly collected material in the Black Sea and the North-East Atlantic, a research team of scientists, led by Dr Vassily Spiridonov from Shirshov Institute of Oceanology of Russian Academy of Sciences, re-described Macropodia czernjawskii and provided the new data on its records and updated its ecological characteristics.
Even though recognised in the Mediterranean Sea, the Macropodia czernjawskii spider crab was ignored by scientists (even by its namesake Vladimir Czernyavsky) in the regional faunal accounts of the Black Sea for more than a century. At the same time, although other species of the genus have been listed as Black sea fauna, those listings are mostly wrong and occurred either due to historical circumstances or misidentifications.Now, scientists re-describe this, most likely, only species of the genus occurring in the Black Sea in the open-access journal Zoosystematics and Evolution.
The spider crab genus Macropodia was discovered in 1814 and currently includes 18 species, mostly occurring in the Atlantic and the Mediterranean. The marine fauna of the Black Sea is predominantly of Mediterranean origin and Macropodia czernjawskii was firstly discovered in the Black Sea in 1880, but afterwards, its presence there was largely ignored by the scientists.
After the revision of available type specimens from all available collections in the Russian museums and the Senckenberg Museum in Frankfurt-on-Main, as well as newly collected material in the Black Sea and the North-East Atlantic, a research team of scientists, led by Dr Vassily Spiridonov from Shirshov Institute of Oceanology of Russian Academy of Sciences, re-described Macropodia czernjawskii and provided the new data on its records and updated its ecological characteristics.
“The analysis of the molecular genetic barcode (COI) of the available material of Macropodia species indicated that M. czernjawskii is a very distinct species while M. parva should be synonimised with M. rostrata, and M. longipes is a synonym of M. tenuirostris”,
states Dr Spiridonov sharing the details of the genus analysis.
All Macropodia species have epibiosis and M. czernjawskii is no exception: almost all examined crabs in 2008-2018 collections had significant epibiosis. It normally consists of algae and cyanobacteria and, particularly, a non-indigenous species of red alga Bonnemaisonia hamifera, officially reported in 2015 at the Caucasian coast of the Black Sea, was found in the epibiosis of M. czernjawskii four years earlier.
“It improves our understanding of its invasion history. Museum and monitoring collections of species with abundant epibiosis (in particular inachid crabs) can be used as an additional tool to record and monitor introduction and establishments of sessile non-indigenous species,”
suggests Dr Spiridonov.
***
Original source:
Spiridonov VA, Simakova UV, Anosov SE, Zalota AK, Timofeev VA (2020) Review of Macropodia in the Black Sea supported by molecular barcoding data; with the redescription of the type material, observations on ecology and epibiosis of Macropodia czernjawskii (Brandt, 1880) and notes on other Atlanto-Mediterranean species of Macropodia Leach, 1814 (Crustacea, Decapoda, Inachidae). Zoosystematics and Evolution 96(2): 609-635. https://doi.org/10.3897/zse.96.48342
In a new study, published in the peer-reviewed open-access scholarly journal Neobiota, scientists estimated the desire of Australians to own non-native and/or illegal alien pets and the major trends in this practice. In addition, the team suggests ways to improve biosecurity awareness in the country.
Unsustainable trade of species is a major pathway for the introduction of invasive alien species at distant localities and at higher frequencies. It is also a major driver of over-exploitation of wild native populations. In a new study, published in the peer-reviewed open-access scholarly journal Neobiota, scientists estimated the desire of Australians to own non-native and/or illegal alien pets and the major trends in this practice. In addition, the team suggests ways to improve biosecurity awareness in the country.
Over the last two decades, Australia has been experiencing an increased amount of non-native incursions from species prominent in the international pet trade, such as rose-ringed parakeets, corn snakes and red-eared sliders. On many occasions, these animals are smuggled into the country only to escape or be released in the wild.
In general, the Australian regulations on international pet trade are highly stringent, in order to minimise biosecurity and conservation risks. Some highly-desirable species represent an ongoing conservation threat and biosecurity risk via the pet-release invasion pathway. However, lack of consistent surveillance of alien pets held, legally or otherwise, in Australia remains the main challenge. While there are species which are not allowed to be imported, they are legal for domestic trade within the country. Pet keepers have the capacity to legally or illegally acquire desired pets if they are not accessible through importation, and the number of such traders is unquantified.
Since keeping most of the alien pets in Australia is either illegal or not properly regulated, it is really difficult to quantify and assess the public demand for alien wildlife.
“We obtained records of anonymous public enquiries to the Australian Commonwealth Department of Agriculture, Water and the Environment relating to the legality of importation of various alien taxa. We aimed to investigate whether species desired in Australia were biased towards being threatened by extinction, as indicated by broader research on pet demand or towards being invasive species elsewhere, which would indicate trade-related biosecurity risks”,
According to the research team’s analysis, pets desired by Australians are significantly biased towards threatened species, invasive species and species prominent in the U.S. pet trade.
“This novel finding is of great concern for biosecurity agencies because it suggests that a filtering process is occurring where illegally smuggled animals may already be “pre-selected” to have the characteristics that are correlated with invasive species,”
warns Mr. Adam Toomes.
However, the bias towards species already traded within the U.S. suggests that there is potential to use this as a means of predicting future Australian desire, as well as the acquisition of pets driven by desire. Future research from the Invasion Science & Wildlife Ecology Group at The University of Adelaide will investigate whether Australian seizures of illegal pets can be predicted using U.S. trade data.
###
Original source:
Toomes A, Stringham OC, Mitchell L, Ross JV, Cassey P (2020) Australia’s wish list of exotic pets: biosecurity and conservation implications of desired alien and illegal pet species. NeoBiota 60: 43-59. https://doi.org/10.3897/neobiota.60.51431
Extensive surveys on wildlife markets and households in the Khammouane Province of Laos showed overlaps between the most traded species at wildlife markets and those of highest conservation importance.
It’s not a surprise to anyone that numerous vertebrate species are being sold at different wildlife markets, but at the moment there is still no comprehensive understanding of how much people are involved in those actions in Laos (Lao PDR), nor what the impact on local wildlife populations really is.
The majority of Laotians live in rural areas and their income largely depends on wildlife. Since wildlife products are used as one of the major food sources, numerous species of terrestrial vertebrates are currently being offered at local markets.
Across the tropical regions, mammals and birds have been vanishing, with recent models estimating up to 83% decline by 2050. Furthermore, wild-caught reptiles have been reported from Southeast Asian wildlife markets for over 20 years, with Laos occupying the position of a very popular source.
Due to the large number of native endemic species, Lao PDR should assume the responsibility to introduce conservation measures to keep control over the predicted population declines. At the moment, the regulations on wildlife use and trade in Laos are mostly based on the Lao Wildlife and Aquatic Law, which, however, largely disregards international statuses of the species and other biological factors.
Stricter and reinforced legislation is needed in the fields related to wildlife trade and consumption, since such practices are not only causing biodiversity loss, but also suggested to pose a great threat of wildlife-associated emergence of zoonotic parasites and pathogens to humans. As an immediate example, the outbreak of the Coronavirus (COVID-19) is primarily considered to be a consequence of human consumption of wild animals.
An international group of students and scientists, led by Professor Dr. Thomas Ziegler at the University of Cologne and the Cologne Zoo (Germany), has conducted a number of extensive surveys on wildlife markets (66 observational surveys at 15 trade hubs) and households (63 households at 14 sites) in the Khammouane Province of Laos. The key question of the survey was: “Which species are traded and to what extent?” The results of the study are published in the open-access journal Nature Conservation.
The surveys showed overlaps between the most traded species at wildlife markets and those of highest conservation importance.
As for the households, approximately 90% of the surveyed respondents confirmed the use of wildlife. For the majority of the population, wildlife harvesting was found to be important for their livelihood and trapping activities were mostly aimed at self-consumption / subsistence. The reason for this could be explained by the prices of domesticated meat, which can be three times as higher as those of wildlife products.
The demand for the species on the wildlife market remained the same over time, according to the opinions of 84,1% of respondents, while the availability of wild meats was reported to have decreased, due to increasing price.
“We recommend local authorities to assess the markets within the province capital Thakhek in particular, as they showed the highest quantity of wild meats. The markets at Namdik and Ban Kok turned out to be very active trade hubs for wildlife as well, regardless of the vertebrate group. The loss of certain species may cause a cascade of unforeseeable effects in the ecosystems. Therefore, the biodiversity of tropical Southeast Asian countries like Lao PDR must be protected,”
shares Dr. Thomas Ziegler.
To help the local population to avoid the crisis related to the change of activity and growing unemployment, scientists propose to introduce new activities in the region.
“Eco-tourism presents a great opportunity to combine conservation efforts and an alternative source of income. Former hunters with excellent knowledge of the forest and wildlife habitats can serve as professional wildlife tour guides or their involvement in the Village Forest Protection Group could help to protect natural resources in Laos”,
suggests Dr. Thomas Ziegler.
###
Original source:
Kasper K, Schweikhard J, Lehmann M, Ebert CL, Erbe P, Wayakone S, Nguyen TQ, Le MD, Ziegler T (2020) The extent of the illegal trade with terrestrial vertebrates in markets and households in Khammouane Province, Lao PDR. Nature Conservation 41: 25-45. https://doi.org/10.3897/natureconservation.41.51888
Metabarcoding allows scientists to extract DNA from the environment, in order to rapidly detect species inhabiting a particular habitat. While the method is a great tool that facilitates conservation activities, few studies have looked into its applicability in monitoring species’ populations and their genetic diversity, which could actually be critical to assess negative trends early on. The potential of the method is confirmed in a new study, published in the peer-reviewed scholarly journal Metabarcoding & Metagenomics.
In a new study, German scientists confirm that responses below species level can be inferred with DNA metabarcoding
Metabarcoding allows scientists to extract DNA from the environment (known as environmental DNA or eDNA), for example, river water or, as in the case of the study by the team from the University of Duisburg-Essen (Essen, Germany) within the German Barcode of Life project (GBOL II): Vera Zizka, Dr Martina Weiss and Prof Florian Leese, from individuals in bulk samples. Thus, they are able to detect what species inhabit a particular habitat.
However, while the method has already been known to be of great use in getting an approximate picture of local fauna, hence facilitating conservation prioritisation, few studies have looked into its applicability to infer responses below species level. That is, how the populations of a particular species fare in the environment of interest, also referred to as intraspecific diversity. Meanwhile, the latter could actually be a lot more efficient in ecosystem monitoring and, consequently, biodiversity loss mitigation.
The potential of the method is confirmed in a new study, published in the peer-reviewed scholarly journal Metabarcoding & Metagenomics. To do so, the researchers surveyed the populations of macroinvertebrate species (macrozoobenthos) in three German rivers: Emscher, Ennepe and Sieg, where each is subject to a different level of ecological disturbance. They were looking specifically at species reported at all of the survey sites by studying the number of different haplotypes (a set of DNA variations usually inherited together from the maternal parent) in each sample. The researchers point out that macrozoobenthos play a key role in freshwater ecosystem functionality and include a wide range of taxonomic groups with often narrow and specific demands with respect to habitat conditions.
“As the most basal level of biodiversity, genetic diversity within species is typically the first to decrease, and the last to regenerate, after stressor’s impact. It consequently provides a proxy for environmental impacts on communities long before, or even if never visible on species diversity level,”
explain the scientists.
Emscher is an urban stream in the Ruhr Metropolitan Area that has been used as an open sewage channel for the past hundred years, and is considered to be a very disturbed environment. Ennepe – regarded as moderately stressed – runs through both rural and urban sites, including ones with sewage treatment plant inflow. Meanwhile, Sieg is considered as a stable, near-natural river system with a good ecological and chemical status.
As a result, despite their original assumption that Sieg would support the most prominent diversity within populations of species sensitive to organic pollution, such as mayflies, stoneflies and caddisflies, the scientists reported no significant difference to the medium stressed river Ennepe. This was also true for overall biodiversity. On the other hand, the team discovered higher intraspecific diversity for species resilient to ecological disturbance like small worms and specialised crustaceans in the heavily disturbed Emscher. The latter phenomenon may be explained with low competition pressure for these species, their ability to use organic compounds as resources and, consequently, increased population growth.
“[T]his pioneer study shows that the extraction of intraspecific genetic variation, so-called ‘haplotypes’ from DNA metabarcoding datasets is a promising source of information to assess intraspecific diversity changes in response to environmental impacts for a whole metacommunity simultaneously,”
conclude the scientists.
However, the researchers also note limitations of their study, including the exclusion of specialist species that only occured at single sites. They prompt future studies to also carefully control for the individual number of specimens per species to quantify genetic diversity change specifically.
###
Original source:
Zizka VMA, Weiss M, Leese F (2020) Can metabarcoding resolve intraspecific genetic diversity changes to environmental stressors? A test case using river macrozoobenthos. Metabarcoding and Metagenomics 4: e51925. https://doi.org/10.3897/mbmg.4.51925
by Mariya Dimitrova, Jorrit Poelen, Georgi Zhelezov, Teodor Georgiev, Lyubomir Penev
Tables published in scholarly literature are a rich source of primary biodiversity data. They are often used for communicating species occurrence data, morphological characteristics of specimens, links of species or specimens to particular genes, ecology data and biotic interactions between species, etc. Tables provide a structured format for sharing numerous facts about biodiversity in a concise and clear way.
Inspired by the potential use of semantically-enhanced tables for text and data mining, Pensoft and Global Biotic Interactions (GloBI) developed a workflow for extracting and indexing biotic interactions from tables published in scholarly literature. GloBI is an open infrastructure enabling the discovery and sharing of species interaction data. GloBI ingests and accumulates individual datasets containing biotic interactions and standardises them by mapping them to community-accepted ontologies, vocabularies and taxonomies. Data integrated by GloBI is accessible through an application programming interface (API) and as archives in different formats (e.g. n-quads). GloBI has indexed millions of species interactions from hundreds of existing datasets spanning over a hundred thousand taxa.
The workflow
First, all tables extracted from Pensoft publications and stored in the OpenBiodiv triple store were automatically retrieved (Step 1 in Fig. 1). There were 6993 tables from 21 different journals. To identify only the tables containing biotic interactions, we used an ontology annotator, currently developed by Pensoft using terms from the OBO Relation Ontology (RO). The Pensoft Annotator analyses free text and finds words and phrases matching ontology term labels.
We used the RO to create a custom ontology, or list of terms, describing different biotic interactions (e.g. ‘host of’, ‘parasite of’, ‘pollinates’) (Step 2 in Fig. 1).. We used all subproperties of the RO term labeled ‘biotically interacts with’ and expanded the list of terms with additional word spellings and variations (e.g. ‘hostof’, ‘host’) which were added to the custom ontology as synonyms of already existing terms using the property oboInOwl:hasExactSynonym.
This custom ontology was used to perform annotation of all tables via the Pensoft Annotator (Step 3 in Fig. 1). Tables were split into rows and columns and accompanying table metadata (captions). Each of these elements was then processed through the Pensoft Annotator and if a match from the custom ontology was found, the resulting annotation was written to a MongoDB database, together with the article metadata. The original table in XML format, containing marked-up taxa, was also stored in the records.
Thus, we detected 233 tables which contain biotic interactions, constituting about 3.4% of all examined tables. The scripts used for parsing the tables and annotating them, together with the custom ontology, are open source and available on GitHub. The database records were exported as json to a GitHub repository, from where they could be accessed by GloBI.
GloBI processed the tables further, involving the generation of a table citation from the article metadata and the extraction of interactions between species from the table rows (Step 4 in Fig. 1). Table citations were generated by querying the OpenBiodiv database with the DOI of the article containing each table to obtain the author list, article title, journal name and publication year. The extraction of table contents was not a straightforward process because tables do not follow a single schema and can contain both merged rows and columns (signified using the ‘rowspan’ and ‘colspan’ attributes in the XML). GloBI were able to index such tables by duplicating rows and columns where needed to be able to extract the biotic interactions within them. Taxonomic name markup allowed GloBI to identify the taxonomic names of species participating in the interactions. However, the underlying interaction could not be established for each table without introducing false positives due to the complicated table structures which do not specify the directionality of the interaction. Hence, for now, interactions are only of the type ‘biotically interacts with’ (Fig. 2) because it is a bi-directional one (e.g. ‘Species A interacts with Species B’ is equivalent to ‘Species B interacts with Species A’).
Examples of species interactions provided by OpenBiodiv and indexed by GloBI are available on GloBI’s website.
In the future we plan to expand the capacity of the workflow to recognise interaction types in more detail. This could be implemented by applying part of speech tagging to establish the subject and object of an interaction.
In addition to being accessible via an API and as archives, biotic interactions indexed by GloBI are available as Linked Open Data and can be accessed via a SPARQL endpoint. Hence, we plan on creating a user-friendly service for federated querying of GloBI and OpenBiodiv biodiversity data.
This collaborative project is an example of the benefits of open and FAIR data, enabling the enhancement of biodiversity data through the integration between Pensoft and GloBI. Transformation of knowledge contained in existing scholarly works into giant, searchable knowledge graphs increases the visibility and attributed re-use of scientific publications.
Tables published in scholarly literature are a rich source of primary biodiversity data. They are often used for communicating species occurrence data, morphological characteristics of specimens, links of species or specimens to particular genes, ecology data and biotic interactions between species etc. Tables provide a structured format for sharing numerous facts about biodiversity in a concise and clear way.
Inspired by the potential use of semantically-enhanced tables for text and data mining, Pensoft and Global Biotic Interactions (GloBI) developed a workflow for extracting and indexing biotic interactions from tables published in scholarly literature. GloBI is an open infrastructure enabling the discovery and sharing of species interaction data. GloBI ingests and accumulates individual datasets containing biotic interactions and standardises them by mapping them to community-accepted ontologies, vocabularies and taxonomies. Data integrated by GloBI is accessible through an application programming interface (API) and as archives in different formats (e.g. n-quads). GloBI has indexed millions of species interactions from hundreds of existing datasets spanning over a hundred thousand taxa.
The workflow
First, all tables extracted from Pensoft publications and stored in the OpenBiodiv triple store were automatically retrieved (Step 1 in Fig. 1). There were 6,993 tables from 21 different journals. To identify only the tables containing biotic interactions, we used an ontology annotator, currently developed by Pensoft using terms from the OBO Relation Ontology (RO). The Pensoft Annotator analyses free text and finds words and phrases matching ontology term labels.
We used the RO to create a custom ontology, or list of terms, describing different biotic interactions (e.g. ‘host of’, ‘parasite of’, ‘pollinates’) (Step 1 in Fig. 1). We used all subproperties of the RO term labeled ‘biotically interacts with’ and expanded the list of terms with additional word spellings and variations (e.g. ‘hostof’, ‘host’) which were added to the custom ontology as synonyms of already existing terms using the property oboInOwl:hasExactSynonym.
This custom ontology was used to perform annotation of all tables via the Pensoft Annotator (Step 3 in Fig. 1). Tables were split into rows and columns and accompanying table metadata (captions). Each of these elements was then processed through the Pensoft Annotator and if a match from the custom ontology was found, the resulting annotation was written to a MongoDB database, together with the article metadata. The original table in XML format, containing marked-up taxa, was also stored in the records.
Thus, we detected 233 tables which contain biotic interactions, constituting about 3.4% of all examined tables. The scripts used for parsing the tables and annotating them, together with the custom ontology, are open source and available on GitHub. The database records were exported as JSON to a GitHub repository, from where they could be accessed by GloBI.
GloBI processed the tables further, involving the generation of a table citation from the article metadata and the extraction of interactions between species from the table rows (Step 4 in Fig. 1). Table citations were generated by querying the OpenBiodiv database with the DOI of the article containing each table to obtain the author list, article title, journal name and publication year. The extraction of table contents was not a straightforward process because tables do not follow a single schema and can contain both merged rows and columns (signified using the ‘rowspan’ and ‘colspan’ attributes in the XML). GloBI were able to index such tables by duplicating rows and columns where needed to be able to extract the biotic interactions within them. Taxonomic name markup allowed GloBI to identify the taxonomic names of species participating in the interactions. However, the underlying interaction could not be established for each table without introducing false positives due to the complicated table structures which do not specify the directionality of the interaction. Hence, for now, interactions are only of the type ‘biotically interacts with’ because it is a bi-directional one (e.g. ‘Species A interacts with Species B’ is equivalent to ‘Species B interacts with Species A’).
In the future, we plan to expand the capacity of the workflow to recognise interaction types in more detail. This could be implemented by applying part of speech tagging to establish the subject and object of an interaction.
In addition to being accessible via an API and as archives, biotic interactions indexed by GloBI are available as Linked Open Data and can be accessed via a SPARQL endpoint. Hence, we plan on creating a user-friendly service for federated querying of GloBI and OpenBiodiv biodiversity data.
This collaborative project is an example of the benefits of open and FAIR data, enabling the enhancement of biodiversity data through the integration between Pensoft and GloBI. Transformation of knowledge contained in existing scholarly works into giant, searchable knowledge graphs increases the visibility and attributed re-use of scientific publications.
References
Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005.
Additional Information
The work has been partially supported by the International Training Network (ITN) IGNITE funded by the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No 764840.