Metadata strategy

October 6, 2010

The University of Southampton data management project has proposed a three-level metadata strategy, see their blog entry “Metadata strategy“:

  1. Project
  2. Discipline
  3. Core

Tardis is based on the Core Scientific Metadata model (CSMD) developed within the Science & Technology Facilities Council (STFC).  One metadata hierarchy they’ve adopted is (turned upside down to match Southampton’s):

  1. Science Specific
  2. Instrument Specific
  3. Core

(This reminds me of Robert Pirsig’s Intellectual Scalpel)

We’re extending Tardis for use within the Australian Synchrotron and ANSTO, where the STFC model is more appropriate.  However, institutional use of Tardis may also be project based.

Tardis supports configurable schemas (parameter sets) at  the experiment, dataset and datafile level.  Appropriate use of the configurable schema should allow us to handle both models, or a combined model.


Clarion Project

July 19, 2010

Thanks to Lesley from the Incremental project for pointing me to the Clarion Project blog.

Clarion provides some great questions to ask scientists when trying to get agreement on publishing data in their
Principal Investigators’ opinions on Open Data entry.

I also like their Design Principles and am looking forward to hearing more on the success of their electronic logbook project.

Incremental Project

July 15, 2010

The University of Cambridge and University of Glasgow have a joint project on data management named “Incremental”.  See their blog entry Scoping study and implementation plan released.

The issues they are looking to address are much the same as we are facing at the Australian Synchrotron and ANSTO with the MeCAT project, including:

  • Procedures for creating and organising data
  • Data storage and access
  • Data back-up
  • Preservation
  • Data sharing and re-use

One more issue comes immediately to mind:

  • Accurate and complete capture of metadata

While AS and ANSTO face all of the issues listed in the Incremental report to a greater or lesser degree, our project is focussed on their last issue listed above.  The Incremental report articulates the problem very clearly:

While many researchers are positive about sharing data in principle, they are almost universally reluctant in practice.  They have invested in collecting or processing data, and using these data to publish results before anyone else is the primary way of gaining prestige in nearly all disciplines.  In addition, researchers complainthat data must be carefully prepared, annotated, and contextualised before they can make it public, which is all very time-consuming and funding is rarely set aside for this.

The report goes in to more details, providing examples of why researchers are reluctant to publish data, and under what conditions they are more likely to share data.

At the moment we’re taking a three-prong approach to this problem:

  1. Defer the problem by providing suitably flexible access control system that allows data to be initially private and then published at a later date.
  2. Initially encourage researchers to just making the existence of the data public, with access only granted on an individual basis after discussion with the researcher.
  3. Focusing on data that can be made public immediately, e.g. reference spectral data sets.

Cultural change will be required in the long term.

The report also notes that “resources must be simple, engaging and easy to access”.  Given our issues with metadata capture, I would emphasize the need for the systems to be engaging.