How the names of organisms help to turn ‘small data’ into ‘Big Data’

Innovation in ‘Big Data’ helps address problems that were previously overwhelming. What we know about organisms is in hundreds of millions of pages published over 250 years. New software tools of the Global Names project find scientific names, index digital documents quickly, correcting names and updating them. These advances help “Making small data big” by linking together to content of many research efforts. The study was published in the open access journal Biodiversity Data Journal.

The ‘Big Data’ vision of science is transformed by computing resources to capture, manage, and interrogate the deluge of information coming from new technologies, infrastructural projects to digitise physical resources (such as our literature from the Biodiversity Heritage Library), or digital versions of specimens and records about specimens by museums.

Increased bandwidth has made dialogue among distributed data centres feasible and this is how new insights into biology are arising. In the case of biodiversity sciences, data centres range in size from the large GenBank for molecular records and the Global Biodiversity Information Facility for records of occurrences of species, to a long tail of tens of thousands of smaller datasets and web-sites which carry information compiled by individuals, research projects, funding agencies, local, state, national and international governmental agencies.

The large biological repositories do not yet approach the scale of astronomy and nuclear physics, but the very large number of sources in the long tail of useful resources do present biodiversity informaticians with a major challenge – how to discover, index, organize and interconnect the information contained in a very large number of locations.

In this regard, biology is fortunate that, from the middle of the 18th Century, the community has accepted the use of latin binomials such as Homo sapiens or Ba humbugi for species. All names are listed by taxonomists. Name recognition tools can call on large expert compilations of names (Catalogue of Life, Zoobank, Index Fungorum, Global Names Index) to find matches in sources of digital information. This allows for the rapid indexing of content.

Even when we do not know a name, we can ‘discover’ it because scientific names have certain distinctive characteristics (written in italics, most often two successive words in a latinised form, with the first one – capitalised). These properties allow names not yet present in compilations of names to be discovered in digital data sources.

The idea of a names-based cyberinfrastructure is to use the names to interconnect large and small distributed sites of expert knowledge distributed across the Internet. This is the concept of the described Global Names project which carried out the work described in this paper.

The effectiveness of such an infrastructure is compromised by the changes to names over time because of taxonomic and phylogenetic research. Names are often misspelled, or there might be errors in the way names are presented. Meanwhile, increasing numbers of species have no names, but are distinguished by their molecular characteristics.

In order to assess the challenge that these problems may present to the realization of a names-based cyberinfrastructure, we compared names from GenBank and DRYAD (a digital data repository) with names from Catalogue of Life to assess how well matched they are.

As a result, we found out that fewer than 15% of the names in pair-wise comparisons of these data sources could be matched. However, with a names parser to break the scientific names into all of their component parts, those parts that present the greatest number of problems could be removed to produce a simplified or canonical version of the name. Thanks to such tools, name-matching was improved to almost 85%, and in some cases to 100%.

The study confirms the potential for the use of names to link distributed data and to make small data big. Nonetheless, it is clear that we need to continue to invest more and better names-management software specially designed to address the problems in the biodiversity sciences.

###

Original source:

Patterson D, Mozzherin D, Shorthouse D, Thessen A (2016) Challenges with using names to link digital biodiversity information. Biodiversity Data Journal, doi: 10.3897/BDJ.4.e8080.

Additional information:

The study was supported by the National Science Foundation.

One of 8 new endemic polyester bees from Chile bears the name of a draconic Pokemon

Among the eight new bee species that Spencer K. Monckton has discovered as part of his Biology Master’s degree at York University, there is one named after a popular draconic creature from the Japanese franchise Pokémon. Called the stem-nesting Charizard, the new insect belongs to a subgenus, whose 17 species are apparently endemic to Chile, yet occupy a huge variety of habitats.

The young scientist, who is currently a PhD student at the University of Guelph, studying sawfly systematics and phylogeography, has his work published in the open access journal ZooKeys.

Known as polyester bees, the family to which the new species belong is characterized by the curious secretions these bees produce. Once applied to the walls of their nest cells, the secretion dries into a smooth, cellophane-like lining.

The new bee species are endemic to Chile, yet they occupy a huge variety of habitats ranging from the hyper-arid Atacama Desert in the north, to moist forests of monkey puzzle trees in the south, spanning elevations from the Pacific coast to more than 3200 metres above sea level. All of them are also solitary and nest in hollow plant stems.

Although the new bee species might lack the fiery breath of the dragon-like Pokémon, much like its namesake, it is normally found around mountains. Also, like the fictional species, the new bee has a distinctively long, snout-like face and broad hind legs, with antennae in place of horns.male charizard 2 head

However, the stem-nesting Charizard bee, as well as the other new species, are tiny creatures that measure between 4 and 7 mm in length. Unlike the predominantly orange colouration of the Pokémon, both males and females are mostly dark brown to black, patterned with variable yellow markings.

Yet, sometimes these yellow markings can turn orange when specimens are preserved, as was the case for the first specimen that Spencer Monckton observed of this species, which, he says, “cemented the comparison”.

In his research paper Spencer Monckton not only describes eight new endemic polyester bees, but he also provides thoroughly illustrated keys for identification of both the males and females of each of the species.

###

Original source:

Monckton SK (2016) A revision of Chilicola (Heteroediscelis), a subgenus of xeromelissine bees (Hymenoptera, Colletidae) endemic to Chile: taxonomy, phylogeny, and biogeography, with descriptions of eight new species. ZooKeys 591: 1-144. doi: 10.3897/zookeys.591.7731

Bee populations expanded during global warming after the last Ice Age

The Australian small carpenter bee populations appear to have dramatically flourished in the period of global warming following the last Ice Age some 18,000 years ago.

The bee species is found in sub-tropical, coastal and desert areas from the north-east to the south of Australia. Researchers Rebecca Dew and Michael Schwarz from the Flinders University of South Australia teamed up with Sandra Rehan, the University of New Hampshire, USA, to model its past responses to climate change with the help of DNA sequences. Their findings are published in the open access Journal of Hymenoptera Research.

“You see a rapid increase in population size from about 18,000 years ago, just as the climate began warming up after the last Ice Age,” says lead author Rebecca Dew. “This matches the findings from two previous studies on bees from North America and Fiji.”

“It is really interesting that you see very similar patterns in bees around the world,” adds Rebecca. “Different climate, different environment, but the bees have responded in the same way at around the same time.”

In the face of future global warming these finding could be a good sign for some of our bees.

However, the news may not all be positive. There are other studies showing that some rare and ancient tropical bees require cool climate and, as a result, are already restricted to the highest mountain peaks of Fiji. For these species, climate warming could spell their eventual extinction.

“We now know that climate change impacts bees in major ways,” says Rebecca, “but the challenge will be to predict how those impacts play out. They are likely to be both positive and negative, and we need to know how this mix will unfold.”

Bees are major pollinators and are critical for many plants, ecosystems, and agricultural crops.Image2

###

Original source:

Dew RM, Rehan SM, Schwarz MP (2016) Biogeography and demography of an Australian native bee Ceratina australensis (Hymenoptera, Apidae) since the last glacial maximum. Journal of Hymenoptera Research 49: 25-41. doi: 10.3897/JHR.49.8066

How to import occurrence records into manuscripts from GBIF, BOLD, iDigBio and PlutoF

On October 20, 2015, we published a blog post about the novel functionalities in ARPHA that allows streamlined import of specimen or occurrence records into taxonomic manuscripts.

Recently, this process was reflected in the “Tips and Tricks” section of the ARPHA authoring tool. Here, we’ll list the individual workflows:

Based on our earlier post, we will now go through our latest updates and highlight the new features that have been added since then.

Repositories and data indexing platforms, such as GBIF, BOLD systems, iDigBio, or PlutoF, hold, among other types of data, specimen or occurrence records. It is now possible to directly import specimen or occurrence records into ARPHA taxonomic manuscripts from these platforms [see Fig. 1]. We’ll refer to specimen or occurrence records as simply occurrence records for the rest of this post.

Import_specimen_workflow_
[Fig. 1] Workflow for directly importing occurrence records into a taxonomic manuscript.
Until now, when users of the ARPHA writing tool wanted to include occurrence records as materials in a manuscript, they would have had to format the occurrences as an Excel sheet that is uploaded to the Biodiversity Data Journal, or enter the data manually. While the “upload from Excel” approach significantly simplifies the process of importing materials, it still requires a transposition step – the data which is stored in a database needs to be reformatted to the specific Excel format. With the introduction of the new import feature, occurrence data that is stored at GBIF, BOLD systems, iDigBio, or PlutoF, can be directly inserted into the manuscript by simply entering a relevant record identifier.

The functionality shows up when one creates a new “Taxon treatment” in a taxonomic manuscript in the ARPHA Writing Tool. To import records, the author needs to:

  1. Locate an occurrence record or records in one of the supported data portals;
  2. Note the ID(s) of the records that ought to be imported into the manuscript (see Tips and Tricks for screenshots);
  3. Enter the ID(s) of the occurrence record(s) in a form that is to be seen in the “Materials” section of the species treatment;
  4. Select a particular database from a list, and then simply clicks ‘Add’ to import the occurrence directly into the manuscript.

In the case of BOLD Systems, the author may also select a given Barcode Identification Number (BIN; for a treatment of BIN’s read below), which then pulls all occurrences in the corresponding BIN.

We will illustrate this workflow by creating a fictitious treatment of the red moss, Sphagnum capillifolium, in a test manuscript. We have started a taxonomic manuscript in ARPHA and know that the occurrence records belonging to S. capillifolium can be found on iDigBio. What we need to do is to locate the ID of the occurrence record in the iDigBio webpage. In the case of iDigBio, the ARPHA system supports import via a Universally Unique Identifier (UUID). We have already created a treatment for S. capillifolium and clicked on the pencil to edit materials [Fig. 2].

Figure-61-01
[Fig. 2] Edit materials
In this example, type or paste the UUID (b9ff7774-4a5d-47af-a2ea-bdf3ecc78885), select the iDigBio source and click ‘Add’. This will pull the occurrence record for S. capillifolium from iDigBio and insert it as a material in the current paper [Fig. 3].

taxon-treatments- 3
[Fig. 3] Materials after they have been imported
This workflow can be used for a number of purposes. An interesting future application is the rapid re-description of species, but even more exciting is the description of new species from BIN’s. BIN’s (Barcode Identification Numbers) delimit Operational Taxonomic Units (OTU’s), created algorithmically at BOLD Systems. If a taxonomist decides that an OTU is indeed a new species, then he/she can import all the type information associated with that OTU for the purposes of describing it as a new species.

Not having to retype or copy/paste species occurrence records, the authors save a lot of efforts. Moreover, they automatically import them in a structured Darwin Core format, which can easily be downloaded from the article text into structured data by anyone who needs the data for reuse.

Another important aspect of the workflow is that it will serve as a platform for peer-review, publication and curation of raw data, that is of unpublished individual data records coming from collections or observations stored at GBIF, BOLD, iDigBio and PlutoF. Taxonomists are used to publish only records of specimens they or their co-authors have personally studied. In a sense, the workflow will serve as a “cleaning filter” for portions of data that are passed through the publishing process. Thereafter, the published records can be used to curate raw data at collections, e.g. put correct identifications, assign newly described species names to specimens belonging to the respective BIN and so on.

 

Additional Information:

The work has been partially supported by the EC-FP7 EU BON project (ENV 308454, Building the European Biodiversity Observation Network) and the ITN Horizon 2020 project BIG4 (Biosystematics, informatics and genomics of the big 4 insect groups: training tomorrow’s researchers and entrepreneurs), under Marie Sklodovska-Curie grant agreement No. 642241.

 

 

Scientist collects 30 sawfly species not previously reported from Arkansas

Sawflies and wood wasps form a group of insects that feed mainly on plants when immature. Field work by Dr. Michael Skvarla, which was conducted during his Ph.D. research at the University of Arkansas, Fayetteville, USA, has uncovered 30 species of these plant-feeding wasps that were previously unknown in the state. The study is published it in the open access journal Biodiversity Data Journal.

After collecting sawflies in tent-like Malaise traps or hanging funnel traps, Dr. Michael Skvarla sent the specimens to retired sawfly expert Dr. David Smith for identification.

In total, 47 species were collected, 30 of which had not been found in Arkansas before. While many of the species are widespread in eastern North America, eight species were known only from areas hundreds of kilometers away.

“I knew that many insect groups had not yet been surveyed in Arkansas, but I was surprised that 66% of the sawfly species we found were new to the state,” Skvarla says.Fig 2 - Acordulecera dorsalis

“In addition, over a quarter of the newly recorded species represent large range extensions of hundreds of miles; Monophadnoides conspiculatus, for instance, was previously known only from the Appalachian Mountains. This work highlights how much basic natural history is left to discover about insects.”

Sawflies and wood wasps comprise the wasp suborder Symphyta and derive their common names from the serrated or saw-shaped ovipositor many species use to lay eggs into plant tissue, and because some species bore into wood.

While some sawfly and woodwasp species can be pests on crops or ornamental plants, most do not pose an economic concern, and all are harmless to people.

###

Original source:

Skvarla M, Smith D, Fisher D, Dowling A (2016) Terrestrial arthropods of Steel Creek, Buffalo National River, Arkansas. II. Sawflies (Insecta: Hymenoptera: “Symphyta”). Biodiversity Data Journal 4: e8830. doi: 10.3897/BDJ.4.e8830

From a bulletin to a modern open access journal: Italian Botanist in Pensoft’s portfolio

Established in the distant 1888, the Italian Botanical Society has gone a long way towards publishing its achievements and research. Originated as a bulletin within an Italian journal, they have been growing ever since to now form a new international journal in its own right. Covering both Italian and international research in botany and mycology, the online open access journal Italian Botanist, published by Pensoft, is now officially launched via its first papers.

Although what was later to become Italian Botanist, published its first issue as an independent journal, called Informatore Botanico Italiano in 1969, the publications were still rather bulletin-style. It consisted of a mixture of administrative and scientific proceedings of the Society, the yearbook of the members, as well as scientific notes.

Nevertheless, such a major transition has been set to change everything fundamentally. Establishing its name, the journal started picking up, so that it was not long before the scientific contributions were prevailing. Impressively, for the Society’s centenary the journal published a celebratory 331-page contribution.

Gradually, its scope was expanded to cover several scientific fields. It hosted several themed columns, including cytotaxonomic contributions on the Italian flora, relevant new floristic records for Italy, conservational issues concerning the Italian flora and mycology.

However, the Directive Council of the Italian Botanical Society has not seemed to be ready to give up on their journal’s evolution. Last year, the botanists decided that they need to transform the journal to an an online, open access journal written in English and called Italian Botanist, in order to boost the scientific value and international visibility of Informatore Botanico Italiano.

italian botanist editorial PR

Under the name Italian Botanist, the journal has now joined Pensoft’s portfolio of peer-reviewed open access journals, all of which take advantage of the advanced technologies and innovations developed by the publisher.

The new journal’s scope ranges from molecular to ecosystem botany and mycology. The geographical coverage of Italian Botanist is specially focused on the Italian territory, but studies from other areas are also welcome.

Staying faithful to its spirit and philosophy, it keeps its column-format, with each issue to contain five columns, namely Chromosome numbers for the Italian flora, Global and Regional IUCN Red List Assessments, Notulae to the Italian flora of algae, briophytes, fungi and lichens, Notulae to the Italian native vascular flora and Notulae to the Italian alien vascular flora.

“Our hope is that this renewed version of the journal will serve the Italian – and foreign – botanical community more efficiently and provide readers worldwide with an easier access to knowledge concerning the Italian flora,” says Italian Botanist‘s Editor-in-Chief Lorenzo Peruzzi.

###

Original source:

Peruzzi L, Siniscalco C (2016) From Bullettino della Società Botanica Italiana to Italian Botanist, passing through Informatore Botanico Italiano. A 128 years-long story. Italian Botanist 1: 1-4. doi: 10.3897/italianbotanist.1.8646

The first long-horned beetle giving birth to live young discovered in Borneo

A remarkably high diversity of the wingless long-horned beetles in the mountains of northern Borneo is reported by three Czech researchers from the Palacký University, Olomouc, Czech Republic. Apart from the genera and species new to science, the entomologists report the first case of reproduction by live birth in this rarely collected group of beetles. The study was published in the open access journal ZooKeys.

Generally, insects are oviparous, which means that their females lay eggs and the embryonic development occurs outside the female’s body. On the other hand, ovoviviparous species retain their eggs in their genital tracts until the larvae are ready to hatch. Such mode of reproduction is a relatively rare phenomenon in insects and even rarer within beetles, where it has been reported for a few unrelated families only.

The long-horned beetles are a family, called Cerambycidae, comprising about 35,000 known species and forming one of the largest beetle groups.

“We studied the diversity of the rarely collected wingless long-horned beetles from Borneo, which is one of the major biodiversity hotspots in the world,” says main author and PhD student Radim Gabriš. “The mountains of northern Borneo, in particular, host a large number of endemic organisms.”

The scientists focused on the group which nobody had studied in detail for more than 60 years. They found surprisingly high morphological diversity in this lineage, which resulted in the descriptions of three genera and four species new to science.

“During a dissection of female genitalia in specimens belonging to the one of the newly described genera, named Borneostyrax, we found out that two females contained large larvae inside their bodies,” recalls Radim Gabriš. “This phenomenon have been known in a few lineages of the related leaf beetles, but this is the first case for the long-horned beetles.”

However, according to the authors, the modes of reproduction remain unknown for many beetle lineages besides Cerambycidae, so the ovoviviparity might be, in fact, much more common. Further detailed studies are needed for better understanding of the reproductive strategy in this group.

###

Original source:

Gabriš R, Kundrata R, Trnka F (2016) Review of Dolichostyrax Aurivillius (Cerambycidae,Lamiinae) in Borneo, with descriptions of three new genera and the first case of (ovo)viviparity in the long-horned beetles. ZooKeys 587: 49-75. doi: 10.3897/zookeys.587.7961

New immigrant: Shiny Cowbirds noted from a recording altitude of 2,800 m in Ecuador

Two juveniles of Shiny Cowbird, a parasitic bird that lays its eggs in the nests of other birds, were spotted in the Andean city of Quito, Ecuador, for the first time. This finding represents an altitudinal expansion of approximately 500 m.

Breeding populations might have been prompted by forest fragmentation and/or climate change, suggest the research team, led by Dr Verónica Crespo-Pérez, professor at Pontificia Universidad Católica del Ecuador (PUCE). Resultingly, the ‘immigrants’ could be threatening native birds. The study is published in the open access Biodiversity Data Journal.

“The Shiny Cowbird is native to the lowlands of South America but within the last 100 years, it has been expanding its distribution to higher altitudes and latitudes” says the lead author.

The bird had already been noted from high altitudes in Bolivia and Perú, and in some localities in the Ecuadorian Andes. Since 2000, Juan Manuel Carrión, co-author and director of the Zoo in Quito, recalls observing Shiny cowbirds near his home in a valley near Quito at 2,300 m above sea level (asl). However, one has never before been reported from an altitude as high as 2,800 m asl.

Moreover, the fact that the observed individuals were juveniles means that the species is already breeding in the city.

“Such a significant expansion of reproductive birds, of approximately 500 m, could be related to human disturbances, like forest fragmentation or climate change,” adds Crespo-Pérez.

The observations took place at the PUCE campus about a year ago. Two juvenile Shiny cowbirds were seen parasitizing two different pairs of Rufous-collared Sparrow, one of the most common birds in Quito. The cowbirds displayed food-begging behaviors to adult sparrows, including chasing the sparrows on the ground and chanting intensely on bushes and tree branches.

“These observations mean that the birth mother of the cowbird laid her eggs in the nests of the sparrows, who inadvertently, became the cowbird’s foster parents and incubated, fed and cared for the it as if it were its own, even though the cowbird is almost twice as big,” says Miguel Pinto, co-author and professor at Escuela Politécnica Nacional, and former postdoctoral fellow at the Smithsonian Institution.

“The sparrows were not feeding fledglings of their own species, which suggests that the Cowbird could be having some negative effect on the Sparrow, at least on their ability to reproduce,” points out Tjitte de Vries, co-author and professor at PUCE.

There are several published reports of negative effects of Cowbirds on other birds, especially on species that are already endangered or have restricted distribution ranges. Therefore, this report of an expansion of the Shiny Cowbird towards higher altitudes may be of concern, mainly for native, endemic or endangered bird species.

###

Original source:

Crespo-Pérez V, Pinto C, Carrión J, Jarrín E R, Poveda C, de Vries T (2016) The Shiny Cowbird, Molothrus bonariensis (Gmelin, 1789) (Aves: Icteridae), at 2,800 m asl in Quito, Ecuador.Biodiversity Data Journal 4: e8184. doi: 10.3897/BDJ.4.e8184

Hollywood star Brad Pitt shares a name with a new wasp species from South Africa

Not only did an international research team discover two new endoparasitic wasp species in South Africa and India, and significantly expanded their genera’s distributional range, but they also gave a celebrity name to a special one of them.

While thinking of a name for the new wasp, Dr Buntika A. Butcher, Chulalongkorn University, Thailand, recalled her long hours of studying in her laboratory right under the poster of her favourite film actor. This is how a parasitic wasp from South Africa was named after Hollywood star Brad Pitt. The researchers have published their findings in the open access journal ZooKeys.bradpitti wasp img2

The new wasp species, called Conobregma bradpitti, belongs to a large worldwide group of wasps parasitising in moth or butterfly caterpillars. These wasps lay their eggs into a host, which once parasitised starts hardening. Thus, the wasp cocoon can safely develop and later emerge from the ‘mummified’ larva. Despite their macabre behaviour, many of these wasp species are considered valuable in agriculture because of their potential as biological control.

Brad Pitt’s flying namesake is a tiny creature measuring less than 2 mm. Its body is deep brown, nearly black in colour, while its head, antennae and legs are brown-yellow. The wings stand out with their much brighter shades.

Interestingly, the wasp with celebrity name unites two, until now, doubtful genera. Being very similar, they had already been noted to have only four diagnostic features that set them apart. However, C. bradpitti shared two of those with each. Thus, the species prompted the solution of the taxonomic problem and, as a result, the two were synonymised.

In their paper, the authors from Chulalongkorn University, Thailand and the University of Calicut, India, also describe another new species of parasitic image 3wasp. It is the first from its subtribe spotted in the whole of India, while its closest ‘relative’ lives in Nepal.

###

Original source:

Butcher BA, Quicke DLJ, Shreevihar S, Ranjith AP (2016) Major range extensions for two genera of the parasitoid subtribe Facitorina, with a new generic synonymy (Braconidae, Rogadinae, Yeliconini). ZooKeys 584: 109-120. doi: 10.3897/zookeys.584.7815