Scientists make plea for sharing of geographic and evolutionary datasets

Scientists make plea for sharing of geographic and evolutionary datasets

Sept. 23, 2013

Two multi-centre groups of scientists – one led by Bryan Drew from the University of Florida and one by CHANS-Net members Joel Hartter from the University of New Hampshire and Sadie Ryan from SUNY College of Environmental Science and Forestry – will publish pleas for scientists to ensure that two highly valuable but hitherto neglected types of data are made openly available in perpetuity. These are published as Perspective articles in PLOS Biology.

For two decades it has been common practice for DNA sequences to be deposited by scientists into robust, openly accessible online databases (such as GenBank) so that this valuable information can be captured and protected for scientists of the future. This is now accepted to be essential, both as an adjunct to the published record, and as a valuable resource in its own right for future studies.

This move has been increasingly reinforced by scientific journals, most of whom now insist on deposition of these data as a condition of publication. Many other types of biological data – protein structures, gene regulation profiles, etc. – have followed suit, and there are now more generic repositories such as Dryad that will take almost any type of scientific data.

The problem is that not all fields have recognised this need, whether for cultural or historical reasons, and the concern is that without urgent changes in practice, hard-won and potentially irreplaceable data may be lost for ever. The two groups of scientists now address the issues that have affected their respective fields and suggest remedies.

Hartter, Ryan and colleagues in “Spatially Explicit Data: Stewardship and Ethical Challenges in Science” look at a broader issue of data management, stewardship and sharing across many types of scientific data. They too note cultural differences between fields, especially regarding the tendency either to share or to guard data.

Drew and colleagues, in their Perspective “Lost Branches in the Tree of Life”, tackle the specific problem of the “family trees” that show the evolutionary relationship between different species. Constructing trees of life (“phylogenetic” trees) involves the collection of data from organisms that are often rare or exotically located. While making a tree of life of nearly two million species, the authors noted that of the 7,500 papers studied, data had been deposited for only one-sixth and were available on request from the original authors in a further one-sixth, leaving two-thirds of trees only available in the original paper. The authors call this a “massive failure”, identify the problem as largely cultural, and propose ways forward, asking journals and funding agencies to insist on deposition of phylogenetic tree data.

They cover spatially-explicit geographical and sociological datasets, and raise the ethical problems that can arise from the conflict between openness and confidentiality in these fields, and from the potentially socially intrusive nature of crowd-sourced and geospatial data. The authors propose a series of measures intended to foster openness while protecting the diverse interests at stake.