Access Controls – implementation

April 27, 2011

I wrote about our Access Control requirements back in July last year.  Since then we’ve completed the design and basic implementation of the authentication and authorisation engines.

The major elements of the access controls are:

  • Access Controls are managed at the Experiment level, i.e. you either have access to the entire experiment or no access.
  • An Experiment can be flagged as public, allowing anonymous access to the experiment.
  • Supported Authentication methods currently include: Internal (Django), LDAP and VBL (Australian Synchrotron proprietary).  Additional methods can be added using an API.
  • Authorisation to experiments may be assigned to individual Users, (internal) Groups and External Groups (see below for an explanation).  Privileges currently include: Read Access, Write (Edit) and Data Owner (see below for a description).

Authentication

MyTARDIS is built on Django, which has it own built in account management.  MyTARDIS supports the Django internal accounts, and has an API allowing additional authentication methods to be defined.  Users can have credentials from multiple user stores linked to a single Django account.  We currently support internal accounts, LDAP (Active Directory) and VBL (Australian Synchrotron proprietary) authentication.

Authorisation

We’ve modified the authorisation mechanism in Django to use ExperimentACLs.  The ExperimentACL defines access to an individual Experiment.

The three components of the ExperimentACL are:

  1. The ExperimentACL type
  2. The rights being granted
  3. The User, Group or External Group receiving the rights

The ExperimentACL may either be a User ACL or System ACL. User ACLs are owned and managed by the Data Owner, while System ACLs are owned and managed by the system administrator.  This allows us to meet the use case of automatically granting read access to all experiments from a particular beamline to the beamline scientists (without the users being able to revoke that access).

The rights include:

  • Read: the right to view the experiment
  • Write: the right to edit the metadata and add and remove new datafiles
  • Delete: the right to delete datasets or the entire experiment
  • Data Owner: the right to manage access for other users (User ExperimentACLs)

As mentioned above, ExperimentACLs can assign rights to individual Users and internal Groups in the usual fashion.

External Groups provide an extensible authorisation mechanism, e.g. granting access based on location (IP address) or membership of a group in an LDAP repository.  A user’s external group membership is evaluated when they log on to the system.

One additional use case that hasn’t been addressed and which we are considering is temporary read access to an experiment.  This is useful when making a paper available for peer review prior to publication.  During the review period the researchers may want to make the data available to the reviewers as well.  Our current thinking is to provide a button that generates a time limited URL that has read access to the experiment, i.e. the URL embeds a time limited key.  Anyone who has access to the URL then has access to the experiment for a short period of time (which will be configurable).

Advertisements

Experimental Equipment

January 14, 2011

Part of the information we want to store about an experiment is the equipment used to conduct the experiment.

Note that this is only used to identify and provide static information about the equipment, any information which is dynamic, such as configuration settings, will be stored as parameters against the relevant datafiles or datasets.

The benefits of explicitly identifying the equipment used in the experiment include:

  • Being able to take any quirks of the equipment in to account when analysing the data.
  • Being able to find all data collected by a piece of equipment to facilitate analysis of the equipment performance.
  • Notifying users of any normalisation that needs to be performed on raw data, e.g. if part of the detector is accidentally burnt out.

We considered a number of approaches to the design:

  1. Adding an Equipment table to TARDIS core schema
  2. Extending ParameterSets to allow them to be shared between projects and creating an Equipment schema
  3. Leaving Equipment out of TARDIS core functionality, implementing an Equipment register as a separate Django application and referencing the entry from a parameter

We are currently working on the third approach, using a separate Equipment register, for the following reasons:

  • A facility may already have an equipment register, or web pages describing the equipment, in which case the functionality is not required.
  • Maintaining information on equipment is not core functionality for TARDIS, which is about experimental metadata.
  • How the equipment information should be federated isn’t clear.  Using a separate register allows the link to be valid from any location while avoiding the replication problems.
  • Django’s architecture makes it easy to implement and deploy multiple small applications (such as an Equipment Register).

Modeling Experimental Data – basic description

October 22, 2010

The schema used in Tadis is based on the Core Scientific Metadata Model (CSMD) developed for the ICAT project.

At the simplest level, the experimental data is simply a collection of files (Datafiles), which are grouped in to Datasets, which are grouped in to Experiments:

Tardis High-level data model

Tardis High-level data model

(Please note that the schema is only partially shown in the diagram above)

At the top level, Tardis stores a flat list of Experiments.   Each Experiment contains one or more Datasets, and each Dataset contains one or more Datafiles.

At each level, Experiment, Dataset and Datafile, user defined parameters may be added, grouped in to Parameter Sets.

Tardis doesn’t impose any interpretation on what is considered an Experiment or Dataset.   Examples of how datasets may be grouped are: by sample, by instrument settings, or as a time sequence, e.g. artificially aging a material and investigating the effects.

In the last post I listed two metadata hierarchies: 1) The Core, Discipline and Project hierarchy from the University of Southampton, and 2) the Core, Instrument and Science hierarchy from STFC.  The core metadata schema is hard-coded in Tardis.  The Instrument, Science and Project schema’s can all be implemented using Parameter Sets.


Metadata strategy

October 6, 2010

The University of Southampton data management project has proposed a three-level metadata strategy, see their blog entry “Metadata strategy“:

  1. Project
  2. Discipline
  3. Core

Tardis is based on the Core Scientific Metadata model (CSMD) developed within the Science & Technology Facilities Council (STFC).  One metadata hierarchy they’ve adopted is (turned upside down to match Southampton’s):

  1. Science Specific
  2. Instrument Specific
  3. Core

(This reminds me of Robert Pirsig’s Intellectual Scalpel)

We’re extending Tardis for use within the Australian Synchrotron and ANSTO, where the STFC model is more appropriate.  However, institutional use of Tardis may also be project based.

Tardis supports configurable schemas (parameter sets) at  the experiment, dataset and datafile level.  Appropriate use of the configurable schema should allow us to handle both models, or a combined model.


ANDS Data Capture Briefing

September 13, 2010

ANDS held a Data Capture Briefing in Melbourne on 2 Sep 2010.  It was great to see some of the other projects in progress, and that Tardis is a potential platform for some of those projects.

There was also updates on RIF-CS, with version 1.2 coming soon.  The updated Service definitions should better meet the MeCAT project requirements.

One aspect that I think needs further thought in RIF-CS is Party lookup.  When making entries available for harvesting, Tardis should connect researchers with the collections they were involved with.  The lack of reliable automated Party lookup makes this difficult to guarantee.


Access Controls

July 26, 2010

As has been highlighted in previous posts, most projects about data publication in the research field have come across the problem that while researchers believe in publishing / sharing data in principle, they have lots of reasons not to do it in practice.  This is a much larger problem than can be addressed in one project, so we’ve decided to work around the problem as much as possible by providing access controls within MeCAT that support publishing data immediately, restricting access either indefinitely or until criteria are met, or sharing on an individual basis.

The set of use cases that we’re using as the basis of the Access Control design are listed below.

The Data Owner has the ability to grant and remove access privileges to the data owned.  The Data Owner will typically be the Principle Investigator or a representative of the Institution.

  • Publicly Accessible
    The data is made publicly available immediately, e.g. data that will become part of a reference database.
  • Accessible by the Data Owner and assigned team members
    Team members may be assigned individually or as a group.
  • Access granted by the Data Owner
    E.g. as a result of direct contact by another researcher.
  • Accessible by anyone at a given physical location, typically the instrument
  • Publicly Accessible after an embargo period, e.g. 3 years
  • Publicly Accessible after a trigger, e.g. paper is published
  • Accessible by facility scientist.  Facility scientists typically have access to all data from the instrument they are responsible for.

I’ll cover the design we’re proposing to support these use cases in a subsequent entry, and am interested in any feedback on these use cases.