Using GitHub as a repository for machine-readable scholarly articles

Getting your Trinity Audio player ready...

Inspired by eLife, Pensoft now deposits article XMLs on GitHub

A core principle at Pensoft Publishers is openness – all of our journals are Open Access and available in multiple formats (both human-readable, e.g. PDF and semantically enhanced HTML, and machine-readable, such as XML) to simplify re-use of information and advance scientific research.

In accordance with this philosophy and starting Mon, 15/07/2013, the XML of all articles in ZooKeysPhytoKeys and MycoKeys are now available on GitHub for all to see, comment, suggest changes and more.

At Pensoft, a guiding principle is technological excellence – we use the best tools available (and build new ones when existing ones are lacking) to provide a service that is advanced, yet easy and accessible. GitHub can be described as a social platform and network used mostly by software developers for coding, discussing, changing, and keeping track of all that. A major benefit of re-use is that it acts as an additional check for quality. As programmers say “given enough eyeballs, all bugs are shallow”.

“We have been impressed by the innovative approach of the eLife journal to use GitHub as a repository for their article XMLs and quickly followed it by advice of Prof. Roderic Page from the University of Glasgow. In a blog post, he pointed out how small errors in the markup of citations could prevent linking those citations to their online versions. Certainly the way to go to turn academic publishing into a more socially based enterprise!” said Prof. Lyubomir Penev, managing director of Pensoft.

Posting of the article XMLs on GitHub would allow these to be corrected and corrections/comments to be submitted through the repository feedback functionality, the so called “pull requests”. Moreover, all changes to the original version can be tracked in public. A good use case didn’t wait for long. Prof. Page wrote a script that identifies literature references lacking DOIs, then automatically checks CrossRef and yields back the missing DOIs. It will be straightforward to insert the missing DOIs into the article XMLs and to expose the version history.

No tags for this post.