Skip to content
Steven Newhouse (BH) edited this page Jun 7, 2024 · 14 revisions

Welcome to the BH SDE wiki!

This page contains the user documentation for the Barts Health Secure Data Environment (SDE).

Introduction

A note on terminology: We refer to a Secure Data Environment (SDE) and a Trusted Research Environment (TRE) interchangeably here. Locally, we prefer the SDE designation, but we're based on a package which calls itself a TRE. To all intents and purposes, they mean the same thing.

What is a Secure Data Environment?

As the name implies, an SDE is an environment in which data can be analysed securely. This means that there is no direct access to the internet, data cannot be uploaded or downloaded freely by the User. This greatly reduces the possibility of data leaks, accidental or otherwise, and is a requirement for handling sensitive data.

Data can only be put into or extracted from the SDE by one of two routes:

  • Data from Barts Health requested and approved via the Data Access Committee (DAC) will be pushed into the project storage after any appropriate pre-processing, such as anonymisation. With the approval of the DAC, we can link data sets from other Data Controllers to our own data either anonymised or psuedoanonymised, depending on the requirements.
  • Other data, software or tools can be imported or exported by going through an 'airlock' process, where the data is staged through secure storage and is reviewed by the PM team before access is granted and the material is available the other side of the airlock.

In order to keep the data secure, there are some restrictions on the analysis environment. The data can be accessed through virtual machines (VMs), contained in a workspace. Each workspace is unique to a project, with only project-members having access to it. The VMs in a workspace cannot be accessed directly through Secure Shell (SSH), they have to be accessed by a web-based virtual desktop.

Our SDE is based on the AzureTRE package from Microsoft. This is an implementation built by Microsoft as an 'accelerator' for the community. It's not a product in the sense that you can't click on a button in the Azure portal and have it deployed, and they don't officially support it at this time. For this reason, we have to develop it further ourselves, so that it satisfies all our requirements. See the timeline below for more details.

Project workspaces

A workspace is a container for resources, or 'services' for a project. Services include things like a Git mirror based on Gitea, a Nexus repository which allows you to access software repositories such as CRAN and PyPi, and a Guacamole Virtual Desktop service, which allows you to create virtual machines. More detail on each service is available elsewhere (TBD).

Once a project has been approved through the DAC, we will provision a workspace for it. Users will be assigned one of two roles:

  • Workspace Owner, who can create and destroy resources in the workspace, or
  • Workspace Researcher, who can use resources, but can only create or destroy selected resources, such as VMs.

A Workspace Owner can manage all the resources created by any researcher, but a researcher can only manage resources that they themselves create.

You should then be able to log into the SDE, select your workspace, and view/manage/use the resources in your project.

Getting help

We have further information on how you can provide feedback and get support.

Using your workspace

See Using your workspace

Timeline

The approximate timeline for the SDE development is as follows:

  • The Alpha version is already available (June 2024). The Alpha is not yet verified to be secure for sensitive data and is only open to select users. We will use feedback from those users to drive the development plan going forwards. As such, we don't expect the users of the alpha to achieve significant analysis results.
  • A Beta will be released in the near future, as soon as we're satisfied the SDE is usable for sensitive data. At this point, real analysis work should be possible.
  • By autumn 2024, we expect to release an MVP, or Minimum Viable Product. This means the basic functionality is there for most or all types of analysis, and the platform is ready for a broader range of users.
  • In Spring 2025, we expect to release a full production version, with the features and functionality we need to support the full range of analysis activities.
  • Further development and enhancements will continue throughout 2025, at least.