De-identification
De-identification is the process of removing or transforming personally identifiable information (PII). This article is an overview of the de-identification workflow in Flywheel.
Flywheel is fully HIPAA and GDPR compliant, and there are a number of configurable de-identification features available to ensure that your data is regulatory compliant and ready for research.
De-identification in Flywheel is configured using de-id profiles. A de-id profile is a set of instructions for what to do with metadata that may include PHI. You can de-identify standard DICOM tags such as PatientName
, StudyDate
, and PatientAge
, as well as private tags unique to your institution.
De-id profile features also includes allowlists, blocklists, remove, replace, or date processing. You can see a full list of options in the de-id profile reference guide. Below is an example of a de-id profile:
---
name: Example1
description: A de-id profile for DICOM files
dicom:
fields:
- name: PatientID
replace-with: REDACTED
# using tag (tuple also supported)
- name: 00080104
replace-with: REDACTED
# using private tag notation
- name: (0009, "GEMS_IMAG_01", 01)
replace-with: REDACTED
# using dotty-notation to access sequence element
- name: 00082218.0.00080102
replace-with: REDACTED
# using * to access all element in the sequence
- name: AnatomicRegionSequence.*.CodeValue
replace-with: REDACTED
# using repeater group notation
- name: (60xx, 0022)
replace-with: REDACTED
Warning
Flywheel does not provide a standard, universal de-identification profile because de-identification requirements and PHI definitions differ from institution to institution, and it is also possible for users/clinicians to add PHI data to a field where it is not intended. It is your responsibility to define, test, and apply de-identification profiles.
Choosing How to Apply the De-id Profile
To configure de-identification on your site, you must consider the following parts of the process:
- De-identification Workflow: This determines when the de-id profile is applied to data. For example, your data can be de-identified on the edge before it is uploaded to Flywheel, or it can be uploaded as is and then de-identified by a gear.
- Upload Method: The method for applying the profile varies depending on how you upload it to Flywheel.
De-identification Workflows
At the Edge
A de-id profile is applied to data before it is uploaded to Flywheel. This means only de-identified data is stored in Flywheel. This is the most common de-identification workflow.
When Moving between Projects (Gear)
This is when the source dataset is uploaded directly to Flywheel and the de-identification steps are applied by the De-Identification Gear.
The most common use case is when data is uploaded directly from an imaging machine, and the only dataset that exists is the one in Flywheel. This means any de-identification changes apply directly to the source dataset. As a result, some Flywheel sites choose to upload data that includes PHI, which is only accessible to a limited number of Flywheel users. Before making the data accessible to more users within Flywheel, the de-identification gear is run. There are two main ways to run the gear:
- Run the gear if you move a project to a new group. For example, if you are collaborating with another lab.
- Run the gear each time you upload data. This method creates 2 datasets in Flywheel, the original dataset with PHI and a second dataset that has been de-identified. Typically the two datasets are in different projects.
Contact Flywheel support for help configuring the gear. See Github for more in-depth details on the De-Identification Gear.
Upload Methods
There are a number of ways to upload data to Flywheel. See the Importing Overview article for more information on how to get your data into Flywheel.
Connector
When you first implement a Flywheel Connector for your imaging machine, you will create a de-identification profile for all images uploaded via the Connector.
Flywheel site admins can work with Flywheel support to make updates to the de-identification configuration after the initial set up.
Learn more about creating de-id profiles
Learn more about how the Connector uploads images
DICOM Uploader
The DICOM Uploader allows you to drag and drop DICOM files directly into a Flywheel project. If that project has a de-id profile enabled, then the DICOM uploader transforms the metadata based on that project’s profile. Group-level and site-level de-id profiles can also apply to datasets that you upload using the DICOM uploader.
Learn more about creating a de-id profile
Learn more about enabling project, group, and site de-id profiles
Flywheel Command-line Interface (CLI)
The Flywheel CLI is a command-line tool for importing large datasets. CLI commands use the --config
flag to apply config files. Along with other types of ingest settings, config files include a section for a de-id profile. Config files also include additional ways to make sure sensitive data isn't uploaded to Flywheel. For example, you can create ingest filters based on filenames or file types.
Learn more about creating a config file
Flywheel SDKs
The Flywheel Python, MATLAB, and R SDKs also use de-id profiles, but profiles can be configured for data other than DICOM images. Currently, the following file types are supported
- DICOM
- JPG
- PNG
- TIFF
- XML
- JSON
- Text file defining key/value pair
- CSV
- TSV
Additional De-identification Considerations
- If you are uploading an existing dataset, de-identification does not change the source dataset. Instead, Flywheel creates a new copy of the dataset with the de-identification changes applied. However, when data is uploaded directly from an imaging machine, the only dataset that exists is the one in Flywheel. This means that any de-identification changes apply directly to the source dataset unless you implement the de-identification gear workflow.
- Flywheel Site Admins can configure project de-identification settings so that they cannot be changed by other users.