Tag: GBIF

How to import occurrence records into manuscripts from GBIF, BOLD, iDigBio and PlutoF

On October 20, 2015, we published a blog post about the novel functionalities in ARPHA that allows streamlined import of specimen or occurrence records into taxonomic manuscripts.

Recently, this process was reflected in the “Tips and Tricks” section of the ARPHA authoring tool. Here, we’ll list the individual workflows:

Based on our earlier post, we will now go through our latest updates and highlight the new features that have been added since then.

Repositories and data indexing platforms, such as GBIF, BOLD systems, iDigBio, or PlutoF, hold, among other types of data, specimen or occurrence records. It is now possible to directly import specimen or occurrence records into ARPHA taxonomic manuscripts from these platforms [see Fig. 1]. We’ll refer to specimen or occurrence records as simply occurrence records for the rest of this post.

Import_specimen_workflow_ — [Fig. 1] Workflow for directly importing occurrence records into a taxonomic manuscript.

Until now, when users of the ARPHA writing tool wanted to include occurrence records as materials in a manuscript, they would have had to format the occurrences as an Excel sheet that is uploaded to the Biodiversity Data Journal, or enter the data manually. While the “upload from Excel” approach significantly simplifies the process of importing materials, it still requires a transposition step – the data which is stored in a database needs to be reformatted to the specific Excel format. With the introduction of the new import feature, occurrence data that is stored at GBIF, BOLD systems, iDigBio, or PlutoF, can be directly inserted into the manuscript by simply entering a relevant record identifier.

The functionality shows up when one creates a new “Taxon treatment” in a taxonomic manuscript in the ARPHA Writing Tool. To import records, the author needs to:

Locate an occurrence record or records in one of the supported data portals;
Note the ID(s) of the records that ought to be imported into the manuscript (see Tips and Tricks for screenshots);
Enter the ID(s) of the occurrence record(s) in a form that is to be seen in the “Materials” section of the species treatment;
Select a particular database from a list, and then simply clicks ‘Add’ to import the occurrence directly into the manuscript.

In the case of BOLD Systems, the author may also select a given Barcode Identification Number (BIN; for a treatment of BIN’s read below), which then pulls all occurrences in the corresponding BIN.

We will illustrate this workflow by creating a fictitious treatment of the red moss, Sphagnum capillifolium, in a test manuscript. We have started a taxonomic manuscript in ARPHA and know that the occurrence records belonging to S. capillifolium can be found on iDigBio. What we need to do is to locate the ID of the occurrence record in the iDigBio webpage. In the case of iDigBio, the ARPHA system supports import via a Universally Unique Identifier (UUID). We have already created a treatment for S. capillifolium and clicked on the pencil to edit materials [Fig. 2].

In this example, type or paste the UUID (b9ff7774-4a5d-47af-a2ea-bdf3ecc78885), select the iDigBio source and click ‘Add’. This will pull the occurrence record for S. capillifolium from iDigBio and insert it as a material in the current paper [Fig. 3].

taxon-treatments- 3 — [Fig. 3] Materials after they have been imported

This workflow can be used for a number of purposes. An interesting future application is the rapid re-description of species, but even more exciting is the description of new species from BIN’s. BIN’s (Barcode Identification Numbers) delimit Operational Taxonomic Units (OTU’s), created algorithmically at BOLD Systems. If a taxonomist decides that an OTU is indeed a new species, then he/she can import all the type information associated with that OTU for the purposes of describing it as a new species.

Not having to retype or copy/paste species occurrence records, the authors save a lot of efforts. Moreover, they automatically import them in a structured Darwin Core format, which can easily be downloaded from the article text into structured data by anyone who needs the data for reuse.

Another important aspect of the workflow is that it will serve as a platform for peer-review, publication and curation of raw data, that is of unpublished individual data records coming from collections or observations stored at GBIF, BOLD, iDigBio and PlutoF. Taxonomists are used to publish only records of specimens they or their co-authors have personally studied. In a sense, the workflow will serve as a “cleaning filter” for portions of data that are passed through the publishing process. Thereafter, the published records can be used to curate raw data at collections, e.g. put correct identifications, assign newly described species names to specimens belonging to the respective BIN and so on.

Additional Information:

The work has been partially supported by the EC-FP7 EU BON project (ENV 308454, Building the European Biodiversity Observation Network) and the ITN Horizon 2020 project BIG4 (Biosystematics, informatics and genomics of the big 4 insect groups: training tomorrow’s researchers and entrepreneurs), under Marie Sklodovska-Curie grant agreement No. 642241.

Streamlining the Import of Specimen or Occurrence Records Into Taxonomic Manuscripts

Repositories and data indexing platforms, such as GBIF, BOLD systems, or iDigBio hold documented specimen or occurrence records along with their record ID’s. In order to streamline the authoring process, save taxonomists’ time, and provide a workflow for peer-review and quality checks of raw occurrence data, the ARPHA team has introduced an innovative feature that makes it possible to easily import specimen occurrence records into a taxonomic manuscript (see Fig. 1).

For the remainder of this post we will refer to specimen data as occurrence records, since an occurrence can be both an observation in the wild, or a museum specimen.

Fig. 1: Workflow for directly importing occurrence records into a taxonomic manuscript.

Until now, when users of the ARPHA writing tool wanted to include occurrence records as materials in a manuscript, they would have had to format the occurrences as an Excel sheet that is uploaded to the Biodiversity Data Journal, or enter the data manually. While the “upload from Excel” approach significantly simplifies the process of importing materials, it still requires a transposition step – the data which is stored in a database needs to be reformatted to the specific Excel format. With the introduction of the new import feature, occurrence data that is stored at GBIF, BOLD systems, or iDigBio, can be directly inserted into the manuscript by simply entering a relevant record identifier.

The functionality shows up when one creates a new “Taxon treatment” in a taxonomic manuscript prepared in the ARPHA Writing Tool. The import functions as follows:

the author locates an occurrence record or records in one of the supported data portals;
the author notes the ID(s) of the records that ought to be imported into the manuscript (see Fig. 2, 3, and 4 for examples);
the author enters the ID(s) of the occurrence records in a form that is to be seen in the materials section of the species treatment, selects a particular database from a list, and then simply clicks ‘Add’ to import the occurrence directly into the manuscript.

In the case of BOLD Systems, the author may also select a given Barcode Identification Number (BIN; for a treatment of BIN’s read below), which then pulls all occurrences in the corresponding BIN (see Fig. 5).

Fig. 2: (Left) An occurrence record in iDigBio. The UUID is highlighted; Fig. 3: (Right) An occurrence record in GBIF. The GBIF ID and the Occurrence ID is highlighted. (Click on images to enlarge)

Fig. 4: (Left) An occurrence record in BOLD Systems. The record ID is highlighted.; Fig. 5: (Right) All occurrence records corresponding to a OTU. The BIN is highlighted. (Click on images to enlarge)

We will illustrate this workflow by creating a fictitious treatment of the red moss, Sphagnum capillifolium, in a test manuscript. Let’s assume we have started a taxonomic manuscript in ARPHA and know that the occurrence records belonging to S. capillifolium can be found in iDigBio. What we need to do is to locate the ID of the occurrence record in the iDigBio webpage. In the case of iDigBio, the ARPHA system supports import via a Universally Unique Identifier (UUID). We have already created a treatment for S. capillifolium and clicked on the pencil to edit materials (Fig. 6). When we scroll all the way down in the pop-up window, we see the form which is displayed in the middle of Fig. 1.

Fig. 6: Edit materials.

From here, the following actions are possible:

insert (an) occurrence record(s) from iDigBio by specifying their UUID’s (universally unique identifier) (Fig.2);
insert (an) occurrence record(s) from GBIF by entering their GBIF ID’s (Fig.3);
insert (an) occurrence record(s) from GBIF by entering their occurrence ID’s (note that unfortunately not all GBIF records have an occurrence ID, which is to be understood as some sort of universal identifier) (Fig. 3);
insert (an) occurrence record(s) from BOLD by entering their record ID’s (Fig. 4);
insert a set of occurrence records from BOLD belonging to a BIN (barcode index number) (Fig. 5).

In this example, select the fifth option (iDigBio) and type or paste the UUID b9ff7774-4a5d-47af-a2ea-bdf3ecc78885 and click Add. This will pull the occurrence record for S. capillifolium from iDigBio and insert it as a material in the current paper (Fig. 6). The same workflow applies also to the aforementioned GBIF and BOLD portals.

Fig. 7: Materials after they have been imported.

This workflow can be used for a number of purposes but one of its most exciting future applications is the rapid re-description of Linnaean species, or new morphological descriptions of species together with DNA barcode sequences (a barcode is a taxon-specific highly conserved gene that provides enough inter-species variation for statistical classification to take place) using the Barcode Identification Numbers (BIN’s) underlying an Operational Taxonomic Units (OTU). If a taxonomist is convinced that a species hypothesis corresponding to OTU defined algorithmically at BOLD systems clearly presents a new species, then he/she can import all specimen records associated with that OTU via inserting that OTU’s BIN ID in the respective fields.

Having imported the specimen occurrence records, the author needs to define one specimen as holotype of the news species, other as paratypes, and so on. The author can also edit the records in the ARPHA tool, delete some, or add new ones, etc.

Another important aspect of the workflow is that it will serve as a platform for peer-review, publication and curation of raw data, that is of unpublished individual data records coming from collections or observations stored at GBIF, BOLD and iDigBio. Taxonomists are used to publish only records of specimens they or their co-authors have personally studied. In a sense, the workflow will serve as a “cleaning filter” for portions of data that are passed through the publishing process. Thereafter, the published records can be used to curate raw data at collections, e.g. put correct identifications, assign newly described species names to specimens belonging to the respective BIN and so on.

Additional Information:

Novel cybercatalog of flower-loving flies suggests the digital future of taxonomy

Charting Earth’s biodiversity is the goal of taxonomy and to do so the scientists need to create an extensive citation network based on several hundred million pages of scientific literature. By providing a novel taxonomic ‘cybercatalog’ of southern African flower-loving (apiocerid) flies, Drs. Torsten Dikow and Donat Agosti demonstrate how the network of taxonomic knowledge can be made available through links provided to online data providers. Their work is available in the open-access Biodiversity Data Journal.

The present research showcases that the information cannot only be made available to the reader who follows the links, but also to machines that use the growing number of digital, online resources that are linked through persistent identifiers.

Primary data providers for taxonomic information such as species names (ZooBank), specimen images (Morphbank), species descriptions (Plazi), and digitized literature (BHL, Biodiversity Heritage Library; BioStor; and BLR, Biodiversity Literature Repository) play an important role in making data on species available in electronic form. Aggregators such as the Global Biodiversity Information Facility (GBIF) and the Encyclopedia of Life (EoL) gather this information automatically to distribute it even further to audiences beyond the reach of the life sciences.

In contrast to previous species catalogs, in cybercatalogs access to information is provided through links to open-access, online data repositories such as the ones listed above. Taxonomists and other users can now access this literature, species descriptions, and specimen records immediately without a search in a natural history library or collection. The cybercatalog takes advantage of a new publishing platform within the Biodiversity Data Journal that makes it easy to upload species information and links to data about these species through a CheckList template. Furthermore, the Biodiversity Data Journal now allows future updates and re-publications of the cybercatalog with the new unique persistent identifier (DOI, Digital Object Identifier) whenever a new species is described or other taxonomic changes take place.

The authors argue that cybercatalogs are indeed the future of taxonomic catalogs since the online data in them are easily accessible to anyone.

“It is a taxonomist’s dream to have online access to all previously published information on a species and through this step the discipline of taxonomy can (re-)position itself as a central resource within the life sciences and beyond to the public and society at large,” add the authors. “Online access will also help to narrow the gap between the South and the North as a fantastic example of unhindered access to our knowledge of the global biological diversity, which is increasingly under pressure from human populations.”

###

For the realization of this project Plazi and Pensoft were partially supported by the EC-FP7 EU BON project (ENV 30845) (Building the European Biodiversity Observation Network).

###

Original source:

Dikow T, Agosti D (2015) Utilizing online resources for taxonomy: a cybercatalog of Afrotropical apiocerid flies (Insecta: Diptera: Apioceridae). Biodiversity Data Journal 3: e5707. doi: 10.3897/BDJ.3.e5707

The four-letter code: How DNA barcoding can accelerate biodiversity inventories

With unprecedented biodiversity loss occurring, we must determine how many species we share the planet with. This can start in our backyards, but speed is critical. A new study shows how biodiversity inventories can be accelerated with DNA barcoding and rapid publishing techniques, making it possible to survey a nature reserve in just four months. The final inventory of 3,500 species was written, released and published in the Biodiversity Data Journal in under one week.

To assess how quickly and effectively DNA barcoding could aid in quantifying biodiversity on a massive scale, the Biodiversity Institute of Ontario partnered with the rare Charitable Research Reserve, a 365+ hectare land reserve located in Ontario, Canada, in an attempt to expand the reserve’s existing species inventory list. To complement this speed in surveying, the two partners also used cutting edge tools and venues for data release and publishing to rapidly disseminate the results.

Surveys of different habitats on the reserve were conducted over four months and culminated in a bioblitz, at which point delegates of the 6th International Barcode of Life Conference joined the effort. “These experts possess invaluable skills that enabled us to identify so many species,” Angela Telfer, University of Guelph, comments in hindsight. “It was a great chance to marry barcoding data with taxonomic data and further our efforts to build a DNA barcode reference library.”

The use of DNA barcoding to conduct this inventory greatly improved the speed at which the results were made available to the public. For the 3,502 specimens barcoded from the bioblitz, the data were generated at an impressive time scale – samples went through lysis, DNA extraction and PCR, sequencing and validation within 72 hours of their collection. Using the BOLD barcode reference library, taxonomy was applied and these results were uploaded to the Global Biodiversity Information Facility (GBIF) via Canadensys within 96 hours of their collection.

Even the choice of journal for publication contributed to the rapid process. The manuscript preparation and submission took considerably less time due to the online writing platform and pre-submission peer-review offered by the Biodiversity Data Journal, used for the first time in this survey. This allowed the 100+ co-authors of this study to all provide input, and reviewers were able to discuss and comment on the paper during the authoring process. All data are now publicly accessible, through the journal article and the various repositories above, and all specimens have been deposited in the Biodiversity Institute of Ontario’s natural history collection and herbarium.

Over the span of four months, the two-staged survey produced a total of 28,916 specimens barcoded or observed across 14 phyla, 29 classes, 117 orders, and 531 families of animals, plants, fungi and lichens. A total of 1,102 species were recorded for the first time for the nature reserve, expanding its existing inventory by 49%.

The results from this mass data collection uncovered abundant biodiversity in taxa that were previously understudied. For example, there were no previous records of spiders at the reserve, but the team’s efforts added an impressive 181 species to the inventory list, three of which were new to the province.

“The survey at rare Charitable Research Reserve is unique to other studies in that within four months – plus a single day of a concentrated bioblitz – more than 25,000 specimens and 3,500 species were recovered, often by non-experts,” explains Connor Warne, a co-author on the paper and specialist in ants. “This model of assessment has the potential to revolutionize the way we uncover diversity in our world. With a coordinated effort, we could implement this model in parks, conservation areas and reserves across the world and take a much needed step in filling in the blank pages of the story of life on earth.”

###

Original source:

Telfer A, Young M, Quinn J, Perez K, Sobel C, Sones J, Levesque-Beaudin V, Derbyshire R, Fernandez-Triana J, Rougerie R, Thevanayagam A, Boskovic A, Borisenko A, Cadel A, Brown A, Pages A, Castillo A, Nicolai A, Glenn Mockford B, Bukowski B, Wilson B, Trojahn B, Lacroix C, Brimblecombe C, Hay C, Ho C, Steinke C, Warne C, Garrido Cortes C, Engelking D, Wright D, Lijtmaer D, Gascoigne D, Hernandez Martich D, Morningstar D, Neumann D, Steinke D, Marco DeBruin D, Dobias D, Sears E, Richard E, Damstra E, Zakharov E, Laberge F, Collins G, Blagoev G, Grainge G, Ansell G, Meredith G, Hogg I, McKeown J, Topan J, Bracey J, Guenther J, Sills-Gilligan J, Addesi J, Persi J, Layton K, D’Souza K, Dorji K, Grundy K, Nghidinwa K, Ronnenberg K, Lee K, Xie L, Lu L, Penev L, Gonzalez M, Rosati M, Kekkonen M, Kuzmina M, Iskandar M, Mutanen M, Fatahi M, Pentinsaari M, Bauman M, Nikolova N, Ivanova N, Jones N, Weerasuriya N, Monkhouse N, Lavinia P, Jannetta P, Hanisch P, McMullin R, Ojeda Flores R, Mouttet R, Vender R, Labbee R, Forsyth R, Lauder R, Dickson R, Kroft R, Miller S, MacDonald S, Panthi S, Pedersen S, Sobek-Swant S, Naik S, Lipinskaya T, Eagalle T, Decaëns T, Kosuth T, Braukmann T, Woodcock T, Roslin T, Zammit T, Campbell V, Dinca V, Peneva V, Hebert P, deWaard J (2015) Biodiversity inventories in high gear: DNA barcoding facilitates a rapid biotic survey of a temperate nature reserve. Biodiversity Data Journal 3: e6313. doi: 10.3897/BDJ.3.e6313