Access Controls – implementation

April 27, 2011

I wrote about our Access Control requirements back in July last year.  Since then we’ve completed the design and basic implementation of the authentication and authorisation engines.

The major elements of the access controls are:

  • Access Controls are managed at the Experiment level, i.e. you either have access to the entire experiment or no access.
  • An Experiment can be flagged as public, allowing anonymous access to the experiment.
  • Supported Authentication methods currently include: Internal (Django), LDAP and VBL (Australian Synchrotron proprietary).  Additional methods can be added using an API.
  • Authorisation to experiments may be assigned to individual Users, (internal) Groups and External Groups (see below for an explanation).  Privileges currently include: Read Access, Write (Edit) and Data Owner (see below for a description).


MyTARDIS is built on Django, which has it own built in account management.  MyTARDIS supports the Django internal accounts, and has an API allowing additional authentication methods to be defined.  Users can have credentials from multiple user stores linked to a single Django account.  We currently support internal accounts, LDAP (Active Directory) and VBL (Australian Synchrotron proprietary) authentication.


We’ve modified the authorisation mechanism in Django to use ExperimentACLs.  The ExperimentACL defines access to an individual Experiment.

The three components of the ExperimentACL are:

  1. The ExperimentACL type
  2. The rights being granted
  3. The User, Group or External Group receiving the rights

The ExperimentACL may either be a User ACL or System ACL. User ACLs are owned and managed by the Data Owner, while System ACLs are owned and managed by the system administrator.  This allows us to meet the use case of automatically granting read access to all experiments from a particular beamline to the beamline scientists (without the users being able to revoke that access).

The rights include:

  • Read: the right to view the experiment
  • Write: the right to edit the metadata and add and remove new datafiles
  • Delete: the right to delete datasets or the entire experiment
  • Data Owner: the right to manage access for other users (User ExperimentACLs)

As mentioned above, ExperimentACLs can assign rights to individual Users and internal Groups in the usual fashion.

External Groups provide an extensible authorisation mechanism, e.g. granting access based on location (IP address) or membership of a group in an LDAP repository.  A user’s external group membership is evaluated when they log on to the system.

One additional use case that hasn’t been addressed and which we are considering is temporary read access to an experiment.  This is useful when making a paper available for peer review prior to publication.  During the review period the researchers may want to make the data available to the reviewers as well.  Our current thinking is to provide a button that generates a time limited URL that has read access to the experiment, i.e. the URL embeds a time limited key.  Anyone who has access to the URL then has access to the experiment for a short period of time (which will be configurable).


Data Licensing

April 27, 2011

Data Licensing has slowly been growing as a topic within the MeCAT project.  From my perspective, adopting an existing license framework is the most attractive options as:

  • Writing something from scratch will be expensive and time consuming.
  • The existing framework should be well tested, thus more likely to meet expectations of data protection.
  • An existing framework should be well known, i.e. users of the data will (hopefully) recognise the license and know what it means without further (time consuming) research of the license (thus promoting re-use of the data, which otherwise might simply be too much trouble).

ANDS (our project funder) has a few pages on licensing, starting at Data Re-use and Licensing Frameworks.  Since they are paying the bills, it is an obvious place to start and a default first choice.  My understanding of where they are currently heading is GILF (AusGOAL), which is basically the Creative Commons licenses, a Restrictive License framework and a framework on how to apply it.

The UK Digital Curation Centre (DCC) How to License Research Data provides a good summary of the various license frameworks.  Creative Commons certainly seems to be the leader based on my criteria above.

Creative Commons has been used for data and databases over the years, but data hasn’t been a focus for the organisation.  That appears to be changing, based on the Creative Commons blog entry CC and data[bases]: huge in 2011, what you can do:  CC licenses are being encouraged and adapted for scientific data

… with the important caveat that CC 3.0 license conditions do not extend to “protect” a database that is otherwise uncopyrightable.

As others have pointed out (link needed), GILF is aimed at Government departments, and so while the basic framework seems appropriate, some of the GILF Policy wording is potentially problematic in its reference to minimum requirements for agencies, employee requirements, etc.  ANDS is working with GILF to improve the wording.

So where does that leave MeCAT?  In the short term, I think we’ll use the CC 0 – 6 licenses as a starting point and review the decision as ANDS updates its position.  The Restrictive License would be useful as a starting point for researchers wanting to share data (only) with other researchers, however it isn’t an urgent consideration for us at the moment.

Attribution wording, liability, etc. are all still to be worked out.

Brian Kelly on Mobile Technologies

April 5, 2011

Brian Kelly recently posted on Mobile Technologies: Why Library Staff Should be Interested.  It’s not directly related to MeCAT, but given the number of discussions and conferences on data management, it is of interest.

My observation about the use of Twitter at conferences is that a large number of the tweets nominate a topic that has been covered, but require the reader to follow up, meaning later search for the relevant information, if they really want to understand what is being discussed, requiring a significant investment of time, which is always in short supply.   Having tools to tweet within a specific context, e.g. the conference, or better the particular session, would be useful.

Brian also reminded me I should get a webcam compatible with Ubuntu (which I’ve recently switched to).

Experimental Equipment

January 14, 2011

Part of the information we want to store about an experiment is the equipment used to conduct the experiment.

Note that this is only used to identify and provide static information about the equipment, any information which is dynamic, such as configuration settings, will be stored as parameters against the relevant datafiles or datasets.

The benefits of explicitly identifying the equipment used in the experiment include:

  • Being able to take any quirks of the equipment in to account when analysing the data.
  • Being able to find all data collected by a piece of equipment to facilitate analysis of the equipment performance.
  • Notifying users of any normalisation that needs to be performed on raw data, e.g. if part of the detector is accidentally burnt out.

We considered a number of approaches to the design:

  1. Adding an Equipment table to TARDIS core schema
  2. Extending ParameterSets to allow them to be shared between projects and creating an Equipment schema
  3. Leaving Equipment out of TARDIS core functionality, implementing an Equipment register as a separate Django application and referencing the entry from a parameter

We are currently working on the third approach, using a separate Equipment register, for the following reasons:

  • A facility may already have an equipment register, or web pages describing the equipment, in which case the functionality is not required.
  • Maintaining information on equipment is not core functionality for TARDIS, which is about experimental metadata.
  • How the equipment information should be federated isn’t clear.  Using a separate register allows the link to be valid from any location while avoiding the replication problems.
  • Django’s architecture makes it easy to implement and deploy multiple small applications (such as an Equipment Register).

Press Release

October 27, 2010

MeCAT has an official press release: ANSTO and Australian Synchrotron choose Aussie software

IT Blog Awards 2010: Individual IT Professional Male

October 25, 2010

Brian Kelly’s UK Web Focus Blog has been nominated for the IT Blog Awards 2010: Individual IT Professional Male.  In his post, Brian quotes and makes reference to his Blog Policies.  My purpose in starting this blog was similar to Brian’s with an obvious focus on my current project, MeCAT, so I’ve taken the liberty of adapting some of Brian’s points:

  • The contents of the blog will primarily address issues related to the MeCAT project and its work on Tardis, including issues around data and metadata management in the research sector.
  • The blog will also provide a test bed for experiments and for testing new services and provide access to discussions about the experiment.
  • The blog will provide an opportunity for me to ‘think out loud“: i.e. describe speculative ideas, thoughts which may occur to me, etc. which may be of interest to others or for which I would welcome feedback.
  • The blog will seek to both disseminate information and encourage discussion and debate.
  • The blog will be used as an open notebook, so that ideas, thoughts and opinions can be shared with others.

Thanks Brian for articulating what had been a rather fuzzy set of ideas!

Modeling Experimental Data – basic description

October 22, 2010

The schema used in Tadis is based on the Core Scientific Metadata Model (CSMD) developed for the ICAT project.

At the simplest level, the experimental data is simply a collection of files (Datafiles), which are grouped in to Datasets, which are grouped in to Experiments:

Tardis High-level data model

Tardis High-level data model

(Please note that the schema is only partially shown in the diagram above)

At the top level, Tardis stores a flat list of Experiments.   Each Experiment contains one or more Datasets, and each Dataset contains one or more Datafiles.

At each level, Experiment, Dataset and Datafile, user defined parameters may be added, grouped in to Parameter Sets.

Tardis doesn’t impose any interpretation on what is considered an Experiment or Dataset.   Examples of how datasets may be grouped are: by sample, by instrument settings, or as a time sequence, e.g. artificially aging a material and investigating the effects.

In the last post I listed two metadata hierarchies: 1) The Core, Discipline and Project hierarchy from the University of Southampton, and 2) the Core, Instrument and Science hierarchy from STFC.  The core metadata schema is hard-coded in Tardis.  The Instrument, Science and Project schema’s can all be implemented using Parameter Sets.