The University of Cambridge and University of Glasgow have a joint project on data management named “Incremental”. See their blog entry Scoping study and implementation plan released.
The issues they are looking to address are much the same as we are facing at the Australian Synchrotron and ANSTO with the MeCAT project, including:
- Procedures for creating and organising data
- Data storage and access
- Data back-up
- Data sharing and re-use
One more issue comes immediately to mind:
- Accurate and complete capture of metadata
While AS and ANSTO face all of the issues listed in the Incremental report to a greater or lesser degree, our project is focussed on their last issue listed above. The Incremental report articulates the problem very clearly:
While many researchers are positive about sharing data in principle, they are almost universally reluctant in practice. They have invested in collecting or processing data, and using these data to publish results before anyone else is the primary way of gaining prestige in nearly all disciplines. In addition, researchers complainthat data must be carefully prepared, annotated, and contextualised before they can make it public, which is all very time-consuming and funding is rarely set aside for this.
The report goes in to more details, providing examples of why researchers are reluctant to publish data, and under what conditions they are more likely to share data.
At the moment we’re taking a three-prong approach to this problem:
- Defer the problem by providing suitably flexible access control system that allows data to be initially private and then published at a later date.
- Initially encourage researchers to just making the existence of the data public, with access only granted on an individual basis after discussion with the researcher.
- Focusing on data that can be made public immediately, e.g. reference spectral data sets.
Cultural change will be required in the long term.
The report also notes that “resources must be simple, engaging and easy to access”. Given our issues with metadata capture, I would emphasize the need for the systems to be engaging.