Incremental Project

The University of Cambridge and University of Glasgow have a joint project on data management named “Incremental”.  See their blog entry Scoping study and implementation plan released.

The issues they are looking to address are much the same as we are facing at the Australian Synchrotron and ANSTO with the MeCAT project, including:

  • Procedures for creating and organising data
  • Data storage and access
  • Data back-up
  • Preservation
  • Data sharing and re-use

One more issue comes immediately to mind:

  • Accurate and complete capture of metadata

While AS and ANSTO face all of the issues listed in the Incremental report to a greater or lesser degree, our project is focussed on their last issue listed above.  The Incremental report articulates the problem very clearly:

While many researchers are positive about sharing data in principle, they are almost universally reluctant in practice.  They have invested in collecting or processing data, and using these data to publish results before anyone else is the primary way of gaining prestige in nearly all disciplines.  In addition, researchers complainthat data must be carefully prepared, annotated, and contextualised before they can make it public, which is all very time-consuming and funding is rarely set aside for this.

The report goes in to more details, providing examples of why researchers are reluctant to publish data, and under what conditions they are more likely to share data.

At the moment we’re taking a three-prong approach to this problem:

  1. Defer the problem by providing suitably flexible access control system that allows data to be initially private and then published at a later date.
  2. Initially encourage researchers to just making the existence of the data public, with access only granted on an individual basis after discussion with the researcher.
  3. Focusing on data that can be made public immediately, e.g. reference spectral data sets.

Cultural change will be required in the long term.

The report also notes that “resources must be simple, engaging and easy to access”.  Given our issues with metadata capture, I would emphasize the need for the systems to be engaging.

Advertisements

2 Responses to Incremental Project

  1. Thanks for linking to us, and thanks for these insights! It’s very exciting to see that our early results are helping others think through similar issues.

    We’ve also noticed that metadata is an enormous hurdle to both long term preservation/use and sharing. Even the term ‘metadata’ makes some people shudder or look at us askance. So, for some disciplines and for generic resources, we’re focusing more on the idea of ‘documentation’ or ‘annotation’ more than ‘metadata’. Some of the researchers with whom we spoke pointed out that, in theory, the published papers can serve as metadata for data sets — but, of course, this doesn’t work in all disciplines, and even when it does, it still usually means that researchers must take the time and care to format and label their data (which, again, many don’t want to do in the first place, because sharing means they can get scooped). Complicated! Like you, we have found that this process only works well if disciplines (or academia in general) see cultural change, which will be a long process.

    There is currently a project called CLARION in the University of Cambridge’s chemistry department, which might interest you if you haven’t encountered them already. It’s centred around metadata capture, controlled embargo, and open archiving/sharing. They blog here: http://clarionproject.wordpress.com/. CLARION is developing an electronic lab notebook as part of the project, which aids metadata capture in real time.

    As far as cross-disciplinary sharing and archiving goes, we’re encouraging repositories and archives to allow for longer (or more customisable) embargoes for data than they do for dissertations or papers. As more types of repositories start taking on research data, there will probably need to be some cultural changes on both sides. The culture of repositories is a very ‘open data, now’ one. As researchers get used to the idea that sharing data can help them more than it hurts them, repositories may have to get used to the idea that researchers need some control over embargoes if they’re going to be willing to share.

    Lots to think about!

    — Lesley Freiman, Incremental

  2. mecatproj says:

    Hi Lesley,

    Thanks very much for taking the time to reply and for the link to Clarion, I’ll be looking in to it further.

    You alluded to the fuzzy definition of metadata and the need for more customisable embargoes. I’m planning to write something on our access control plans in the next couple of weeks (famous last words, I’m sure :-)).

    If you haven’t already seen it, the Consultative Committee for Space Data Systems – Reference Model for an Open Archival Information System (OAIS) uses the term Representation Information to cover the additional information required for interpreting data.

    Thanks again and I look forward to following Incremental’s progress.

    Cheers,
    Alistair

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: