De-Identification Overview
Introduction
De-identification is the process of removing or transforming personally identifiable information (PII). This article is an overview of the de-identification workflow in Flywheel.
Instruction Steps
How does de-identification work in Flywheel?
De-identification in Flywheel is configured using de-id profiles. A de-id profile is a set of instructions for what to do with metadata that may include PHI. They can de-identify standard DICOM tags such as PatientName, StudyDate, and PatientAge, as well as private tags unique to your institution.
To configure de-identification in Flywheel consider the following parts of the de-identification process:
- De-identification workflow: This determines when the de-id profile is applied to data
- Upload method This determines how you apply the de-id profile
De-identification Workflows
Below are the two main de-identification workflows in Flywheel. The workflow you use depends on your data, what you plan to do with the data, and who has access to it.
De-identification at the Edge
A de-id profile is applied to data before it is uploaded to Flywheel. This means only de-identified data is stored in Flywheel. This is the most common de-identification workflow. The Upload Methods section below describe how data is de-identified at the edge.
De-identification Gear
This is when the source dataset is uploaded directly to Flywheel and the de-identification steps are applied by the De-Identification Gear.
The most common use case is when data is uploaded directly from an imaging machine, and the only dataset that exists is the one in Flywheel. This means any de-identification changes apply directly to the source dataset. As a result, some Flywheel sites choose to upload data that includes PHI, which is only accessible to a limited number of Flywheel users. Before making the data accessible to more users within Flywheel, the de-identification gear is run. There are two main ways to run the gear:
- Run the gear if you move a project to a new group. For example, if you are collaborating with another lab.
- Run the gear each time you upload data. This method creates 2 datasets in Flywheel, the original dataset with PHI and a second dataset that has been de-identified. Typically the two datasets are in different projects.
Contact Flywheel support for help configuring the gear. See Github for more in-depth details on the De-Identification Gear.
Upload Methods
De-id profiles can be applied to all upload methods in Flywheel so that data is de-identified on the edge:
Connector
When you first implement a Flywheel Connector for your imaging machine, you will create a de-identification profile for all images uploaded via the Connector.
Flywheel site admins can work with Flywheel support to make updates to the de-identification configuration after the initial set up.
Learn more about creating de-id profiles
Learn more about how the Connector uploads images
DICOM Uploader
The DICOM Uploader allows you to drag and drop DICOM files directly into a Flywheel project. If that project has a de-id profile enabled, then the DICOM uploader transforms the metadata based on that project’s profile. Group-level and site-level de-id profiles can also apply to datasets that you upload using the DICOM uploader.
Learn more about creating a de-id profile
Learn more about enabling project, group, and site de-id profiles
Flywheel Command-line Interface (CLI)
The Flywheel CLI is a command-line tool for importing large datasets. CLI commands use the --config
flag to apply config files. Along with other types of ingest settings, config files include a section for a de-id profile. Config files also include additional ways to make sure sensitive data isn't uploaded to Flywheel. For example, you can create ingest filters based on filenames or file types.
Learn more about creating a config file
Flywheel SDKs
The Flywheel Python, MATLAB, and R SDKs also use de-id profiles, but profiles can be configured for data other than DICOM images. Currently, the following file types are supported
- DICOM
- JPG
- PNG
- TIFF
- XML
- JSON
- Text file defining key/value pair
- CSV
- TSV
Additional De-identification Considerations
- If you are uploading an existing dataset, de-identification does not change the source dataset. Instead, Flywheel creates a new copy of the dataset with the de-identification changes applied. However, when data is uploaded directly from an imaging machine, the only dataset that exists is the one in Flywheel.
This means that any de-identification changes apply directly to the source dataset unless you implement the de-identification gear workflow.
- Flywheel Site Admins can configure project de-identification settings so that they cannot be changed by other users.