An audit of more than 9000 species occurrence records in two online databases has uncovered a large number of errors. The study also highlighted the fact that online database publishers currently take no responsibility for the content of their databases, and do not collaborate with their data providers in checking and correcting the online data. The audit results and the associated data files have been published in the open access journal ZooKeys.
The records checked were for native Australian millipede species and were published online by the Global Biodiversity Information Facility, GBIF and the Atlas of Living Australia, ALA. GBIF and ALA obtain most of their records from cooperating museums, but disclaim any responsibility for errors in museum databases, instead warning users that the data may not be accurate or fit for purpose.
The auditing was done voluntarily by Dr Bob Mesibov, who is a millipede specialist and a research associate at the Queen Victoria Museum and Art Gallery in Launceston, Tasmania.
The audit found duplicated records and other bookkeeping problems, as well as errors in scientific nomenclature and in locations and dates for specimen collections. Location errors were particularly common, with 15% of a ‘best data’ subset of the records at least 5 km from the correct locality.
"The data quality problem is not trivial," said Dr Mesibov. "On the one hand, the data aggregators like GBIF and ALA are telling the world that they offer one-stop shops for data that can, for example, greatly assist decision-making in conservation and land management. On the other hand, the aggregators are not working to ensure that the data they publish are correct. And bad data aren’t very useful."
Dr Mesibov contacted museums directly to alert them to errors he found and to query inconsistencies in the occurrence records. The museums concerned have edited their records and will pass corrections on to GBIF and ALA when updating their contributions to the online databases. Error-correcting at the level of GBIF and ALA is slow and piecemeal, says Dr Mesibov, and should not have to rely on interested outsiders like himself.
"Data cleaning isn’t rocket science," said Dr. Mesibov. "The aggregators could do much more checking and could collaborate with their providers in sorting out inconsistencies and fixing at least some of the errors. At the moment, that doesn’t seem to be happening, so GBIF and ALA users need to take the aggregators’ warnings about data quality very seriously."
Mesibov R (2013) A specialist’s audit of aggregated occurrence records. ZooKeys 293: 1, doi: 10.3897/zookeys.293.5111