Access Controls – implementation

April 27, 2011

I wrote about our Access Control requirements back in July last year.  Since then we’ve completed the design and basic implementation of the authentication and authorisation engines.

The major elements of the access controls are:

  • Access Controls are managed at the Experiment level, i.e. you either have access to the entire experiment or no access.
  • An Experiment can be flagged as public, allowing anonymous access to the experiment.
  • Supported Authentication methods currently include: Internal (Django), LDAP and VBL (Australian Synchrotron proprietary).  Additional methods can be added using an API.
  • Authorisation to experiments may be assigned to individual Users, (internal) Groups and External Groups (see below for an explanation).  Privileges currently include: Read Access, Write (Edit) and Data Owner (see below for a description).


MyTARDIS is built on Django, which has it own built in account management.  MyTARDIS supports the Django internal accounts, and has an API allowing additional authentication methods to be defined.  Users can have credentials from multiple user stores linked to a single Django account.  We currently support internal accounts, LDAP (Active Directory) and VBL (Australian Synchrotron proprietary) authentication.


We’ve modified the authorisation mechanism in Django to use ExperimentACLs.  The ExperimentACL defines access to an individual Experiment.

The three components of the ExperimentACL are:

  1. The ExperimentACL type
  2. The rights being granted
  3. The User, Group or External Group receiving the rights

The ExperimentACL may either be a User ACL or System ACL. User ACLs are owned and managed by the Data Owner, while System ACLs are owned and managed by the system administrator.  This allows us to meet the use case of automatically granting read access to all experiments from a particular beamline to the beamline scientists (without the users being able to revoke that access).

The rights include:

  • Read: the right to view the experiment
  • Write: the right to edit the metadata and add and remove new datafiles
  • Delete: the right to delete datasets or the entire experiment
  • Data Owner: the right to manage access for other users (User ExperimentACLs)

As mentioned above, ExperimentACLs can assign rights to individual Users and internal Groups in the usual fashion.

External Groups provide an extensible authorisation mechanism, e.g. granting access based on location (IP address) or membership of a group in an LDAP repository.  A user’s external group membership is evaluated when they log on to the system.

One additional use case that hasn’t been addressed and which we are considering is temporary read access to an experiment.  This is useful when making a paper available for peer review prior to publication.  During the review period the researchers may want to make the data available to the reviewers as well.  Our current thinking is to provide a button that generates a time limited URL that has read access to the experiment, i.e. the URL embeds a time limited key.  Anyone who has access to the URL then has access to the experiment for a short period of time (which will be configurable).


Data Licensing

April 27, 2011

Data Licensing has slowly been growing as a topic within the MeCAT project.  From my perspective, adopting an existing license framework is the most attractive options as:

  • Writing something from scratch will be expensive and time consuming.
  • The existing framework should be well tested, thus more likely to meet expectations of data protection.
  • An existing framework should be well known, i.e. users of the data will (hopefully) recognise the license and know what it means without further (time consuming) research of the license (thus promoting re-use of the data, which otherwise might simply be too much trouble).

ANDS (our project funder) has a few pages on licensing, starting at Data Re-use and Licensing Frameworks.  Since they are paying the bills, it is an obvious place to start and a default first choice.  My understanding of where they are currently heading is GILF (AusGOAL), which is basically the Creative Commons licenses, a Restrictive License framework and a framework on how to apply it.

The UK Digital Curation Centre (DCC) How to License Research Data provides a good summary of the various license frameworks.  Creative Commons certainly seems to be the leader based on my criteria above.

Creative Commons has been used for data and databases over the years, but data hasn’t been a focus for the organisation.  That appears to be changing, based on the Creative Commons blog entry CC and data[bases]: huge in 2011, what you can do:  CC licenses are being encouraged and adapted for scientific data

… with the important caveat that CC 3.0 license conditions do not extend to “protect” a database that is otherwise uncopyrightable.

As others have pointed out (link needed), GILF is aimed at Government departments, and so while the basic framework seems appropriate, some of the GILF Policy wording is potentially problematic in its reference to minimum requirements for agencies, employee requirements, etc.  ANDS is working with GILF to improve the wording.

So where does that leave MeCAT?  In the short term, I think we’ll use the CC 0 – 6 licenses as a starting point and review the decision as ANDS updates its position.  The Restrictive License would be useful as a starting point for researchers wanting to share data (only) with other researchers, however it isn’t an urgent consideration for us at the moment.

Attribution wording, liability, etc. are all still to be worked out.

Brian Kelly on Mobile Technologies

April 5, 2011

Brian Kelly recently posted on Mobile Technologies: Why Library Staff Should be Interested.  It’s not directly related to MeCAT, but given the number of discussions and conferences on data management, it is of interest.

My observation about the use of Twitter at conferences is that a large number of the tweets nominate a topic that has been covered, but require the reader to follow up, meaning later search for the relevant information, if they really want to understand what is being discussed, requiring a significant investment of time, which is always in short supply.   Having tools to tweet within a specific context, e.g. the conference, or better the particular session, would be useful.

Brian also reminded me I should get a webcam compatible with Ubuntu (which I’ve recently switched to).