All processes fit into a broad S-shaped envelope extending from the briefest to the most enduring biological events. For the first time, we have the first simple model that depicts the scope and scale of biology.
As biology is progressing into a digital age, it is creating new opportunities for discovery.
Increasingly, information from investigations into aspects of biology from ecology to molecular biology is available in a digital form. Older ‘legacy’ information is being digitized. Together, the digital information is accumulated in databases from which it can be harvested and examined with an increasing array of algorithmic and visualization tools.
That information also must make its way to trustworthy repositories to guarantee the permanent access to the data in a polished and fully suited for re-use state.
The first layer in the infrastructure is the one that gathers all old and new information, whether it be about the migrations of ocean mammals, the sequence of bases in ribosomal RNA, or the known locations of particular species of ciliated protozoa.
This is achieved by compiling information about the processes conducted by all living organisms. The processes occur at all levels of organization, from sub-molecular transactions, such as those that underpin nervous impulses, to those within and among plants, animals, fungi, protists and prokaryotes. Further, they are also the actions and reactions of individuals and communities; but also the sum of the interactions that make up an ecosystem; and finally, the consequences of the biosphere as a whole system.
In the Nature’s Envelope, information on sizes of participants and durations of processes from all levels of organization are plotted on a grid. The grid uses a logarithmic (base 10) scale, which has about 21 orders of magnitude of size and 35 orders of magnitude of time. Information on processes ranging from the subatomic, through molecular, cellular, tissue, organismic, species, communities to ecosystems is assigned to the appropriate decadal blocks.
The extremes of life processes are determined by the smallest and largest entities to participate, and the briefest and most enduring processes.
The briefest event to be included is the transfer of energy from a photon to a photosynthetic pigment as the photon passes through a chlorophyll molecule several nanometres in width at a speed of 300,000 km per second. That transaction is conducted in about 10-17 seconds. As it involves the smallest subatomic particles, it defines the lower left corner of the grid.
The most enduring is the process of evolution that has been progressing for almost 4 billion years. The influence of the latter has created the biosphere (the largest living object) and affects the gas content of the atmosphere. This process established the upper right extreme of the grid.
All biological processes fit into a broad S-shaped envelope that includes about half of the decadal blocks in the grid. The envelope drawn round the initial examples is Nature’s Envelope.
Legumes are a group of plants that include soybeans, peas, chickpeas, peanuts and lentils. They are a significant source of protein, fibre, carbohydrates, and minerals in our diet and some, like the cowpea, are resistant to droughts.
The project’s outcomes were published in a data paper in the Biodiversity Data Journal. Within the project, the digitisation team aimed to collectively digitise non-type herbarium material from the legume family. This includes rosewood trees (Dalbergia), padauk trees (Pterocarpus) and the Phaseolinae subtribe that contains many of the beans cultivated for human and animal food.
Guinea, Ethiopia, Sudan, Kenya, Uganda, Tanzania, Mozambique, Malawi and Madagascar
Bangladesh, Myanmar, Nepal, New Guinea and India
Southern and Central American
Guatemala, Honduras, El Salvador, Nicaragua, Bolivia, Argentina and Brazil
The legume groups: Dalbergia, Pterocarpus and Phaseolinae,were chosen for digitisation to support the development of dry beans as a sustainable and resilient crop, and to aid conservation and sustainable use of rosewood and padauk trees. Some of these beans, especially cow pea and pigeon pea, are sustainable and resilient crops, as they can be grown in poor-quality soils and are drought stress resistant. This makes them particularly suitable for agricultural production where the growing of other crops would be difficult.
While there have been collaborative efforts between herbaria in the past, these have tended to prioritise digitisation of type specimens: the example specimens for which a species is named.
Searching for beans
This collection was digitised by creating an inventory record for each specimen, attaching images of each herbarium sheet, and then transcribing more data and georeferencing the specimens, providing an accurate locality in space and time for their collection.
We originally had four months and three members of staff to digitise over 11,000 specimens. The Covid-19 lockdown was ironically rather lucky for this project as it enabled us to have more time to transcribe and georeference all of the records.
say the researchers behind the digitisation project.
“We were able to assign country-level data to 10,857 out of the total number of 11,222 records. We were also able to transcribe the collectors’ names from the majority of our specimen labels (10,879 out of 11,222). Only 770 out of the 2,226 individuals identified during this project collected their specimens in ODA listed countries. The highest contributors were: Richard Beddome (130 specimens), Charles Clarke (110), Hans Schlieben (98) and Nathaniel Wallich (79). The breakdown of records by ODA country can be seen in the chart below. “
From our data, we can see the peak decade of collection was the 1930s, with almost half (4,583 specimens or 49,43%) collected between 1900 and 1950 (Fig. 10).
This peak can be attributed to three of our most prolific collectors: Arthur Kerr, John Gossweiler and Georges Le Testu, all of whom were most active in the 1930s. The oldest specimen (BM013713473) was collected by Mark Catesby (1683-1749) in the Bahamas in 1726.
Both the Pterocarpus and Dalbergia genera include species that are used as expensive good quality timber that is prone to illegal logging. Many species such as Pterocarpus tinctorius are also listed on the International Union for Conservation of Nature (IUCN) Red List of Threatened Species. By releasing this new resource of information on all these plants from three of the biggest herbaria in the world, we can share this datа with the people who are taking care of biodiversity in these countries. The data can be used to identify hotspots, where the tree is naturally growing and protect these areas. These data would also allow much closer attention to be paid to areas that could be targets for illegal logging activity.
In a world first, the Natural History Museum, London, has collaborated with economic consultants, Frontier Economics Ltd, to explore the economic and societal value of digitising natural history collections and concluded that digitisation has the potential to see a seven to tenfold return on investment. Whilst significant progress is already being made at the Museum, additional investment is needed in order to unlock the full potential of the Museum’s vast collections – more than 80 million objects. The project’s report is published in the open science scientific journal Research Ideas and Outcomes (RIO Journal).
The societal benefits of digitising natural history collections extends to global advancements in food security, biodiversity conservation, medicine discovery, minerals exploration, and beyond. Brand new, rigorous economic report predicts investing in digitising natural history museum collections could also result in a tenfold return. The Natural History Museum, London, has so far made over 4.9 million digitised specimens available freely online – over 28 billion records have been downloaded over 429,000 download events over the past six years.
Digitisation at the Natural History Museum, London
Digitisation is the process of creating and sharing the data associated with Museum specimens. To digitise a specimen, all its related information is added to an online database. This typically includes where and when it was collected and who found it, and can include photographs, scans and other molecular data if available. Natural history collections are a unique record of biodiversity dating back hundreds of years, and geodiversity dating back millennia. Creating and sharing data this way enables science that would have otherwise been impossible, and we accelerate the rate at which important discoveries are made from our collections.
The Natural History Museum’s collection of 80 million items is one of the largest and most historically and geographically diverse in the world. By unlocking the collection online, the Museum provides free and open access for global researchers, scientists, artists and more. Since 2015, the Museum has made 4.9 million specimens available on the Museum’s Data Portal, which have seen more than 28 billion downloads over 427,000 download events.
This means the Museum has digitised about 6% of its collections to date. Because digitisation is expensive, costing tens of millions of pounds, it is difficult to make a case for further investment without better understanding the value of this digitisation and its benefits.
In 2021, the Museum decided to explore the economic impacts of collections data in more depth, and commissioned Frontier Economics to undertake modelling, resulting in this project report, now made publicly available in the open-science journal Research Ideas and Outcomes (RIO Journal), and confirming benefits in excess of £2 billion over 30 years. While the methods in this report are relevant to collections globally, this modelling focuses on benefits to the UK, and is intended to support the Museum’s own digitisation work, as well as a current scoping study funded by the Arts & Humanities Research Council about the case for digitising all UK natural science collections as a research infrastructure.
How digitisation impacts scientific research?
The data from museum collections accelerates scientific research, which in turn creates benefits for society and the economy across a wide range of sectors. Frontier Economics Ltd have looked at the impact of collections data in five of these sectors: biodiversity conservation, invasive species, medicines discovery, agricultural research and development and mineral exploration.
The new analyses attempt to estimate the economic value of these benefits using a range of approaches, with the results in broad agreement that the benefits of digitisation are at least ten times greater than the costs. This represents a compelling case for investment in museum digital infrastructure without which the many benefits will not be realised.
Other benefits could include improvements to the resilience of agricultural crops by better understanding their wild relatives, research into invasive species which can cause significant damage to ecosystems and crops, and improving the accuracy of mining.
Finally, there are other impacts that such work could have on how science is conducted itself. The very act of digitising specimens means that researchers anywhere on the planet can access these collections, saving time and money that may have been spent as scientists travelled to see specific objects.
Popov D, Roychoudhury P, Hardy H, Livermore L, Norris K (2021) The Value of Digitising Natural History Collections. Research Ideas and Outcomes 7: e78844. https://doi.org/10.3897/rio.7.e78844
Deep learning techniques manage to differentiate between similar plant families with up to 99 percent accuracy, Smithsonian researchers reveal
Millions, if not billions, of specimens reside in the world’s natural history collections, but most of these have not been carefully studied, or even looked at, in decades. While containing critical data for many scientific endeavors, most objects are quietly sitting in their own little cabinets of curiosity.
Thus, mass digitization of natural history collections has become a major goal at museums around the world. Having brought together numerous biologists, curators, volunteers and citizens scientists, such initiatives have already generated large datasets from these collections and provided unprecedented insight.
Now, a study, recently published in the open access Biodiversity Data Journal, suggests that the latest advances in both digitization and machine learning might together be able to assist museum curators in their efforts to care for and learn from this incredible global resource.
Their study is among the first to describe the use of deep learning methods to enhance our understanding of digitized collection samples. It is also the first to demonstrate that a deep convolutional neural network–a computing system modelled after the neuron activity in animal brains that can basically learn on its own–can effectively differentiate between similar plants with an amazing accuracy of nearly 100%.
In the paper, the scientists describe two different neural networks that they trained to perform tasks on the digitized portion (currently 1.2 million specimens) of the United States National Herbarium.
The team first trained a net to automatically recognize herbarium sheets that had been stained with mercury crystals, since mercury was commonly used by some early collectors to protect the plant collections from insect damage. The second net was trained to discriminate between two families of plants that share a strikingly similar superficial appearance.
The trained neural nets performed with 90% and 96% accuracy respectively (or 94% and 99% if the most challenging specimens were discarded), confirming that deep learning is a useful and important technology for the future analysis of digitized museum collections.
“The results can be leveraged both to improve curation and unlock new avenues of research,” conclude the scientists.
“This research paper is a wonderful proof of concept. We now know that we can apply machine learning to digitized natural history specimens to solve curatorial and identification problems. The future will be using these tools combined with large shared data sets to test fundamental hypotheses about the evolution and distribution of plants and animals,” says Dr. Laurence J. Dorr, Chair of the Smithsonian Department of Botany.
Schuettpelz E, Frandsen P, Dikow R, Brown A, Orli S, Peters M, Metallo A, Funk V, Dorr L (2017) Applications of deep convolutional neural networks to digitized natural history collections. Biodiversity Data Journal 5: e21139. https://doi.org/10.3897/BDJ.5.e21139