Access Controls

July 26, 2010

As has been highlighted in previous posts, most projects about data publication in the research field have come across the problem that while researchers believe in publishing / sharing data in principle, they have lots of reasons not to do it in practice.  This is a much larger problem than can be addressed in one project, so we’ve decided to work around the problem as much as possible by providing access controls within MeCAT that support publishing data immediately, restricting access either indefinitely or until criteria are met, or sharing on an individual basis.

The set of use cases that we’re using as the basis of the Access Control design are listed below.

The Data Owner has the ability to grant and remove access privileges to the data owned.  The Data Owner will typically be the Principle Investigator or a representative of the Institution.

  • Publicly Accessible
    The data is made publicly available immediately, e.g. data that will become part of a reference database.
  • Accessible by the Data Owner and assigned team members
    Team members may be assigned individually or as a group.
  • Access granted by the Data Owner
    E.g. as a result of direct contact by another researcher.
  • Accessible by anyone at a given physical location, typically the instrument
  • Publicly Accessible after an embargo period, e.g. 3 years
  • Publicly Accessible after a trigger, e.g. paper is published
  • Accessible by facility scientist.  Facility scientists typically have access to all data from the instrument they are responsible for.

I’ll cover the design we’re proposing to support these use cases in a subsequent entry, and am interested in any feedback on these use cases.


Using a Core Scientific Metadata Model in Large-Scale Facilities

July 22, 2010

Thanks to the UKOLN News Feed for pointing to the International Journal of Digital Curation Vol 5., No 1. It contains a paper titled Using a Core Scientific Metadata Model in Large-Scale Facilities.  The paper provides a good overview of the CSMD schema, which is “a model for the representation of scientific study metadata developed within the Science & Technology Facilities Council (STFC) to represent the data generated from scientific facilities”.

Clarion Project

July 19, 2010

Thanks to Lesley from the Incremental project for pointing me to the Clarion Project blog.

Clarion provides some great questions to ask scientists when trying to get agreement on publishing data in their
Principal Investigators’ opinions on Open Data entry.

I also like their Design Principles and am looking forward to hearing more on the success of their electronic logbook project.

Incremental Project

July 15, 2010

The University of Cambridge and University of Glasgow have a joint project on data management named “Incremental”.  See their blog entry Scoping study and implementation plan released.

The issues they are looking to address are much the same as we are facing at the Australian Synchrotron and ANSTO with the MeCAT project, including:

  • Procedures for creating and organising data
  • Data storage and access
  • Data back-up
  • Preservation
  • Data sharing and re-use

One more issue comes immediately to mind:

  • Accurate and complete capture of metadata

While AS and ANSTO face all of the issues listed in the Incremental report to a greater or lesser degree, our project is focussed on their last issue listed above.  The Incremental report articulates the problem very clearly:

While many researchers are positive about sharing data in principle, they are almost universally reluctant in practice.  They have invested in collecting or processing data, and using these data to publish results before anyone else is the primary way of gaining prestige in nearly all disciplines.  In addition, researchers complainthat data must be carefully prepared, annotated, and contextualised before they can make it public, which is all very time-consuming and funding is rarely set aside for this.

The report goes in to more details, providing examples of why researchers are reluctant to publish data, and under what conditions they are more likely to share data.

At the moment we’re taking a three-prong approach to this problem:

  1. Defer the problem by providing suitably flexible access control system that allows data to be initially private and then published at a later date.
  2. Initially encourage researchers to just making the existence of the data public, with access only granted on an individual basis after discussion with the researcher.
  3. Focusing on data that can be made public immediately, e.g. reference spectral data sets.

Cultural change will be required in the long term.

The report also notes that “resources must be simple, engaging and easy to access”.  Given our issues with metadata capture, I would emphasize the need for the systems to be engaging.

Welcome to the MeCAT blog

July 14, 2010

MeCAT is a joint project between the Australian Nuclear Science and Technology Organisation (ANSTO) and the Australian Synchrotron (AS), funded by the Australian National Data Service (ANDS), to improve the management and publication of research data and metadata at the two facilities.

This blog will be used to publish updates about the project, discuss issues the project is facing and hopefully to connect with other similar projects being run around the world.