In the heat of this year’s Peer Review Week, themed “Recognition for Review”, we would like to express how and why we are so proud to be part of it and Publons’ initiative Sentinels of Science, meant to recognize the true guardians of quality science, or in other words, the peer reviewers.
Being a high-tech and modern publishing solution, developed by Pensoft with the mindset that to adapt to the future, means to innovate, ARPHA itself was set to take the quite stagnant current peer review practice forward from day one.
This is why and how we provide a range of peer review options for every author submitting their work to any journal, published on the ARPHA platform. For example, here are the four-stages of the open peer review process operating in our flagship Research Ideas and Outcomes (RIO) Journal:
Author-organised, pre-submission review, available to all journals that make use of our ARPHA Writing Tool, which is our way to take the common get-a-friend-to-proofread-your-work practice to a whole new, transparent and technologically facilitated level. The review happens in real time with the author and the reviewers being able to work together in the ARPHA online environment. It is not mandatory, but we encourage it strongly. All pre-submission reviews provided on authors’ request in RIO can be published along with the article, bearing DOI and citation details.
Pre-submission technical and editorial check is another benefit, provided by the journal’s editorial office to those who are using the ARPHA Writing Tool. If necessary, it can take up several rounds, until the manuscript is improved to the level appropriate for direct submission to the journal.
The community-sourced, post-publication, open peer review is the next review stage provided to all articles published in RIO and all other ARPHA journals.
In addition, RIO also provides journal-organised, post-publication open peer review upon author’s request. In all other ARPHA journals this review stage happens mandatory before publication.
To facilitate peer review in any journal published on the platform, ARPHA consolidates every review automatically into a single online file, which makes it possible for reviewers to comment in real time, even during the authoring process. Once posted, the whole peer review history is archived along with the associated files.
To recognize peer review even further, ARPHA registers automatically each of our peer reviewers, along with their work, on Publons, thanks to the integration of all Pensoft journals with the platform, created to credit reviewers and their contributions.
With this vision of peer review, we simply could not stay clear of the aspiring Sentinels of Science initiative, started by Publons. It only made sense for us to step in, which logically led to the ARPHA logo appearing in the Gold star sponsors list.
On Friday, 23rd September, Publons will announce the recipients of the inaugural Sentinels of Science Award – the top reviewers and editors for the past year. So, tune in this Friday at 4:00 P.M. (BST) and do not forget to join the Twitter conversation via hashtags #PeerRevWk16 and #RecognizeReview.
We want to stress at this point that the import functionality itself is agnostic of the data source and any metadata file in EML 2.1.1 or 2.1.0 can be imported. We have listed these three most likely sources of metadata to illustrate the workflow.
In the remainder of the post, we will go through the original post from October 13, 2015 and highlight the latest updates.
At the time of the writing of the original post, the Biodiversity Information Standards conference, TDWG 2015, was taking place in Kenya. Data sharing, data re-use, and data discovery were being brought up in almost every talk. We might have entered the age of Big Data twenty years ago, but it is now that scientists face the real challenge – storing and searching through the deluge of data to find what they need.
As the rate at which we exponentially generate data exceeds the rate at which data storage technologies improve, the field of data management seems to be greatly challenged. Worse, this means the more new data is generated, the more of the older ones will be lost. In order to know what to keep and what to delete, we need to describe the data as much as possible, and judge the importance of datasets. This post is about a novel way to automatically generate scientific papers describing a dataset, which will be referred to as data papers.
The common characters of the records, i.e. descriptions of the object of study, the measurement apparatus and the statistical summaries used to quantify the records, the personal notes of the researcher, and so on, are called metadata. Major web portals such as DataONE, the Global Biodiversity Information Facility(GBIF), or the Long Term Ecological Research Network store metadata in conjunction with a given dataset as one or more text files, usually structured in special formats enabling the parsing of the metadata by algorithms.
To make the metadata and the corresponding datasets discoverable and citable, the concept of the data paper was introduced in the early 2000’s by the Ecological Society of America. This concept was brought to the attention of the biodiversity community by Chavan and Penev (2011) with the introduction of a new data paper concept, based on a metadata standard, such as the Ecological Metadata Language, and derived from metadata content stored at large data platforms, in this case the Global Biodiversity Information Facility (GBIF). You can read this article for an in-depth discussion of the topic.
Therefore, in the remainder of this post we will explain how to use an automated approach to publish a data paper describing an online dataset in Biodiversity Data Journal. The ARPHA system will convert the metadata describing your dataset into a manuscript for you after reading in the metadata. We will illustrate the workflow on the previously mentioned DataONE and GBIF.
The Data Observation Network for Earth (DataONE) is a distributed cyberinfrastructure funded by the U.S. National Science Foundation. It links together over twenty five nodes, primarily in the U.S., hosting biodiversity and biodiversity-related data, and provides an interface to search for data in all of them(Note: In the meantime, DataONE has updated their search interface).
Since butterflies are neat, let’s search for datasets about butterflies on DataONE! Type “Lepidoptera” in the search field and scroll down to the dataset describing “The Effects of Edge Proximity on Butterfly Biodiversity.” You should see something like this:
As you can notice, this resource has two objects associated with it: metadata, which has been highlighted, and the dataset itself. Let’s download the metadata from the cloud! The resulting text file, “Blandy.235.1.xml”, or whatever you want to call it, can be read by humans, but is somewhat cryptic because of all the XML tags. Now, you can import this file to the ARPHA writing platform and the information stored in it would be used to create a data paper!Go to the ARPHA web-site, and click on “Start a manuscript,” then scroll all the way down and click on “Import manuscript”.
Upload the “blandy” file and you will see an “Authors’ page,” where you can select which of the authors mentioned in the metadata must be included as authors of the data paper itself. Note that the user of ARPHA uploading the metadata is added to the list of the authors even if they are not included in the metadata. After the selection is done, a scholarly article is created by the system with the information from the metadata already in the respective sections of the article:
Now, the authors can add some description, edit out errors, tell a story, cite someone – all of this without leaving ARPHA – i.e. do whatever it takes to produce a high-quality scholarly text. After they are done, they can submit their article for peer-review and it could be published in a matter of hours. Voila!
Let’s look at GBIF. Go to “Data -> Explore by country” and select “Saint Vincent and the Grenadines,” an English-speaking Caribbean island. There are, as of the time of writing of this post, 166 occurrence datasets containing data about the islands. Select the dataset from the Museum of Comparative Zoology at Harvard. If you scroll down, you will see the GBIF annotated EML. Download this as a separate text file (if you are using Chrome, you can view the source, and then use Copy-Paste). Do the exact same steps as before – go to “Import manuscript” in ARPHA and upload the EML file. The result should be something like this, ready to finalize:
To finish it up, we want to leave you with some caveats and topics for further discussion. Till today, useful and descriptive metadata has not always been present. There are two challenges: metadata completeness and metadata standards. The invention of the EML standard was one of the first efforts to standardize how metadata should be stored in the field of ecology and biodiversity science.
Currently, our import system supports the last two versions of the EML standard: 2.1.1 and 2.1.0, but we hope to further develop this functionality. In an upcoming version of their search interface, DataONE will provide infographics on the prevalence of the metadata standards on their site (as illustrated below), so there is still work to be done, but if there is a positive feedback from the community, we will definitely keep elaborating this feature.
Regarding metadata completeness, our hope is that by enabling scientists to create scholarly papers from their metadata with a single-step process, they will be incentivized to produce high-quality metadata.
Now, allow us to give a disclaimer here: the authors of this blog post have nothing to do with the two datasets. They have not contributed to any of them, nor do they know the authors. The datasets have been chosen more or less randomly since the authors wanted to demonstrate the functionality with a real-world example. You should only publish data papers if you know the authors or you are the author of the dataset itself. During the actual review process of the paper, the authors that have been included will get an email from the journal.
Additional information:
This project has received funding from the European Union’s FP7 project EU BON (Building the European Biodiversity Observation Network), grant agreement No 308454, and Horizon 2020 research and innovation project BIG4 (Biosystematics, informatics and genomics of the big 4 insect groups: training tomorrow’s researchers and entrepreneurs) under the Marie Sklodovska-Curie grant agreement No. 642241 for a PhD project titled Technological Implications of the Open Biodiversity Knowledge Management System.
Repositories and data indexing platforms, such as GBIF, BOLD systems, or iDigBio hold documented specimen or occurrence records along with their record ID’s. In order to streamline the authoring process, save taxonomists’ time, and provide a workflow for peer-review and quality checks of raw occurrence data, the ARPHA team has introduced an innovative feature that makes it possible to easily import specimen occurrence records into a taxonomic manuscript (see Fig. 1).
For the remainder of this post we will refer to specimen data as occurrence records, since an occurrence can be both an observation in the wild, or a museum specimen.
Fig. 1: Workflow for directly importing occurrence records into a taxonomic manuscript.
Until now, when users of the ARPHA writing tool wanted to include occurrence records as materials in a manuscript, they would have had to format the occurrences as an Excel sheet that is uploaded to the Biodiversity Data Journal, or enter the data manually. While the “upload from Excel” approach significantly simplifies the process of importing materials, it still requires a transposition step – the data which is stored in a database needs to be reformatted to the specific Excel format. With the introduction of the new import feature, occurrence data that is stored at GBIF, BOLD systems, or iDigBio, can be directly inserted into the manuscript by simply entering a relevant record identifier.
The functionality shows up when one creates a new “Taxon treatment” in a taxonomic manuscript prepared in the ARPHA Writing Tool. The import functions as follows:
the author locates an occurrence record or records in one of the supported data portals;
the author notes the ID(s) of the records that ought to be imported into the manuscript (see Fig. 2, 3, and 4 for examples);
the author enters the ID(s) of the occurrence records in a form that is to be seen in the materials section of the species treatment, selects a particular database from a list, and then simply clicks ‘Add’ to import the occurrence directly into the manuscript.
In the case of BOLD Systems, the author may also select a given Barcode Identification Number (BIN; for a treatment of BIN’s read below), which then pulls all occurrences in the corresponding BIN (see Fig. 5).
Fig. 2: (Left) An occurrence record in iDigBio. The UUID is highlighted; Fig. 3: (Right) An occurrence record in GBIF. The GBIF ID and the Occurrence ID is highlighted. (Click on images to enlarge)
Fig. 4: (Left) An occurrence record in BOLD Systems. The record ID is highlighted.; Fig. 5: (Right) All occurrence records corresponding to a OTU. The BIN is highlighted. (Click on images to enlarge)
We will illustrate this workflow by creating a fictitious treatment of the red moss, Sphagnum capillifolium, in a test manuscript. Let’s assume we have started a taxonomic manuscript in ARPHA and know that the occurrence records belonging to S. capillifolium can be found in iDigBio. What we need to do is to locate the ID of the occurrence record in the iDigBio webpage. In the case of iDigBio, the ARPHA system supports import via a Universally Unique Identifier (UUID). We have already created a treatment for S. capillifolium and clicked on the pencil to edit materials (Fig. 6). When we scroll all the way down in the pop-up window, we see the form which is displayed in the middle of Fig. 1.
Fig. 6: Edit materials.
From here, the following actions are possible:
insert (an) occurrence record(s) from iDigBio by specifying their UUID’s (universally unique identifier) (Fig.2);
insert (an) occurrence record(s) from GBIF by entering their GBIF ID’s (Fig.3);
insert (an) occurrence record(s) from GBIF by entering their occurrence ID’s (note that unfortunately not all GBIF records have an occurrence ID, which is to be understood as some sort of universal identifier) (Fig. 3);
insert (an) occurrence record(s) from BOLD by entering their record ID’s (Fig. 4);
insert a set of occurrence records from BOLD belonging to a BIN (barcode index number) (Fig. 5).
In this example, select the fifth option (iDigBio) and type or paste the UUID b9ff7774-4a5d-47af-a2ea-bdf3ecc78885 and click Add. This will pull the occurrence record for S. capillifolium from iDigBio and insert it as a material in the current paper (Fig. 6). The same workflow applies also to the aforementioned GBIF and BOLD portals.
Fig. 7: Materials after they have been imported.
This workflow can be used for a number of purposes but one of its most exciting future applications is the rapid re-description of Linnaean species, or new morphological descriptions of species together with DNA barcode sequences (a barcode is a taxon-specific highly conserved gene that provides enough inter-species variation for statistical classification to take place) using the Barcode Identification Numbers (BIN’s) underlying an Operational Taxonomic Units (OTU). If a taxonomist is convinced that a species hypothesis corresponding to OTU defined algorithmically at BOLD systems clearly presents a new species, then he/she can import all specimen records associated with that OTU via inserting that OTU’s BIN ID in the respective fields.
Having imported the specimen occurrence records, the author needs to define one specimen as holotype of the news species, other as paratypes, and so on. The author can also edit the records in the ARPHA tool, delete some, or add new ones, etc.
Not having to retype or copy/paste species occurrence records, the authors save a lot of efforts. Moreover, they automatically import them in a structured Darwin Core format, which can easily be downloaded from the article text into structured data by anyone who needs the data for reuse.
Another important aspect of the workflow is that it will serve as a platform for peer-review, publication and curation of raw data, that is of unpublished individual data records coming from collections or observations stored at GBIF, BOLD and iDigBio. Taxonomists are used to publish only records of specimens they or their co-authors have personally studied. In a sense, the workflow will serve as a “cleaning filter” for portions of data that are passed through the publishing process. Thereafter, the published records can be used to curate raw data at collections, e.g. put correct identifications, assign newly described species names to specimens belonging to the respective BIN and so on.
Additional Information:
The work has been partially supported by the EC-FP7EU BON project (ENV 308454, Building the European Biodiversity Observation Network) and the ITN Horizon 2020 project BIG4 (Biosystematics, informatics and genomics of the big 4 insect groups: training tomorrow’s researchers and entrepreneurs), under Marie Sklodovska-Curie grant agreement No. 542241.
The former Pensoft Writing Tool (PWT) appears under a new name with exciting functionalities customized to your needs
It’s been almost two full years since we first launched the Pensoft Writing Tool (PWT) as the first ever workflow that supports the full life cycle of a manuscript, from authoring, to peer-review, publishing and dissemination. Now it is time to move a step forward with an updated tool that incorporates all our accumulated experience and your invaluable feedback. PWT is now transforming into ARPHA Writing Tool (AWT) – a rebrand that means much more than a change of name and design.
So, what is so cool about the new ARPHA Writing tool? Here it is:
New modern outlook and user-friendly design
All editing happens in the manuscript preview mode
Plug-in for mathematical formulas
Pre-submission technical validation, by automated tool and humans
Pre-submission external peer-review
Importing manuscripts through Application Programming Interface (API)
Those of you who have been using the PWT remember the two writing modes – Preview and Editing. Over the past two years, we’ve learned that this might sometimes be tricky. With the AWT, there will be no more flipping between modes. The tool now contains only one editing mode – this means rich editing functions and direct visualisation of your changes and comments straight into the the article preview.
Besides, the AWT will take a step beyond biodiversity data publishing towards providing a large set of predefined, yet flexible article templates to allow the publication of most types of research outcomes. As the scope is broadening, we also strive to simplify and improve the user experience.
The AWT is all about user-friendliness. With the new intuitive design and more comprehensible functions, the system is fast to navigate and get used to. While making every effort to improve user experience, we made sure functions are straightforward and easy to discover.
The AWT makes collaborative work on a manuscript with co-authors or peers easier than ever. Mentors, pre-submission reviewers, linguistic or copy editors can now contribute to the manuscript side by side. The collaborative peer-review process provides easy communication thanks to a track-change function, comments and replies, as well as automated, but customisable email and social network notification tools.
The tool also provides authors with a two-step technical validation – the manuscript is examined for consistency automatically by the system, followed by a second check from our staff ahead of publication. After an article is published, the AWT also offers easy republication of updated article versions via the authoring tool.
Perhaps the most innovative feature of AWT, however, is the new functionality to invite reviewers still during the authoring process. This function is still globally unique as it allows the authors to discuss manuscripts with their peers before submission, and consequently to submit the reviews together with the manuscript. In case the editor approves the manuscript for publication based on the pre-submission review(s), the manuscript can be published just a few days after submission.
Research Ideas & Outcomes (RIO), a new open access journal, is formally announced. The new journal represents a paradigm shift in academic publishing: for the first time, RIO will publish research from all stages of the research cycle, across a broad suite of disciplines, from humanities to science.
Traditional journals accept only articles produced at the end of the research continuum, long after the core work has been completed. RIO will publish ideas and outputs from all stages of the research cycle: proposals, experimental designs, data, software, research articles, project reports, policy briefs, project management plans and more.
The journal takes another step ahead with a collaborative platform that allows all ideas and outputs to be labelled with Impact Categories based upon UN Millennium Development Goals(MDGs) and EU Societal Challenges. These categories provide social impact-based labelling to help funders, journalists and the wider public discover and finance relevant research as well as to foster interdisciplinary collaboration around societal challenges.
These game-changing ideas come packed with technical innovation and unique features. The journal is published through ARPHA, the first publishing platform ever to support the full life cycle of a manuscript: from authoring to submission, public peer review, publication and dissemination, within a single, fully-integrated online collaborative environment. The new platform will also allow for RIO to offer one of the most transparent, open and public peer review processes, thus building trust in the reviewed outcomes.
These features come à la carte: RIO will offer flexible pricing where authors can choose exactly which publishing services fit their needs and budget. All its contents – including reviews and comments, data and code – will receive a persistent unique identifier, will be permanently archived and made available under open licenses without any access embargo.
“RIO is not just about different kinds of submissions, though that is a crucial feature and certainly unique for publishing ongoing or even proposed research: it is also about linking those submissions together across the research cycle, about reducing the time from submission to publication, about collaborative authoring and reviewing, about mapping to societal challenges, about technical innovation, about enabling reuse and about giving authors more choice in what features they actually want from the journal.” said Dr. Daniel Mietchen, a founding editor of RIO.
“I’m proud to pioneer the first journal which can publish research from all stages of the research process,” said Prof. Lyubomir Penev, Co-Founder of RIO and Pensoft. “For the first time, researchers can get formal publication credit for previously ‘hidden’ parts of their work like written research proposals. We can publish all outputs in one journal; the same journal – RIO.”
RIO is scheduled to start accepting manuscripts in November 2015.