Skip to content

Logo Logo

Create a De-id Profile

Introduction

In Flywheel, de-identification is configured using a de-id profile. A de-id profile is a set of instructions for what to do with metadata that may include PHI.

This article explains how to create a de-id profile to remove or transform sensitive DICOM data. There is also a reference guide for all possible data transformations with examples.

Warning

Disclaimer: This feature is not guaranteed to satisfy any specific regulatory or compliance requirements. It is your responsibility to ensure that you set the appropriate configuration parameters and evaluate the end result to determine whether it is acceptable for your use cases and any regulatory or compliance requirements you may have.

Instruction Steps

What is the de-id profile?

A de-id profile is a set of instructions for what to do with metadata that may include PHI. De-id profiles can de-identify standard DICOM tags such as PatientName, StudyDate, and PatientAge, as well as private tags unique to your institution.

In general, there are 2 levels of de-identification:

  • File settings: At this level, the de-identification settings apply to all DICOM data by default. For example, you can use remove-private-tags to remove all non-standard DICOM tags. Another example of a file setting is recurse-sequence , which cascades de-identification transformations down an entire sequence of nested tags. You can override these settings by using field settings. See the reference guide for more possible file settings.
  • Fields settings: These settings give you finer control over what specific DICOM tags are de-identified and HOW they are de-identified. This setting takes precedent over the file settings, so you can add exceptions to a rule. This means you can set the remove-private-tags to true at the file level, but then choose to keep a specific custom tag by using the field transformation keep. See the reference guide for more possible field options.

How does it work with Flywheel?

All upload methods offer the option to use a profile to de-identify data at the edge of the Flywheel platform. De-identifying at the edge means only de-identified data is uploaded to Flywheel. Then each time you import data into Flywheel–whether by the Connector, CLI, Web Uploader, or SDK– your data is de-identified according to your profile. See our de-identification overview article for more details.

Set up a de-identification logger to track changes to file metadata automatically, so you can re-identify data. See our article to learn more about setting up de-identification logging.

Step 1: Download the Flywheel CLI

Creating and testing de-id profiles is easiest with the Flywheel CLI. See our article for more information on how to download the CLI and sign in to your Flywheel account.

Step 2: Generate a de-id template

To begin, we will generate a de-id template. In later steps we will update the template to fit your data

  1. Open the Terminal or Windows Command Prompt app on your computer.
  2. Navigate to the Flywheel CLI.
  3. Enter the following command:

  4. Windows: fw deid create C:\Users\[username]\Documents\deid_profile.yaml

Make sure to replace username with your username.

  • Mac/Linux: fw deid create ~/Documents/deid_profile.yaml

  • You will see the following message: Sample template successfully created

deidCreateProfile.png 5. Open the deid_profile.yaml file in a plain text editor such as TextEdit, notepad, or Sublime. You will see the following template:

# You can give your de-identification profile a name

name: custom

# Indicates where you want to place the de-id log. You will use this log file to preview
# the de-id updates before uploading
# The option is ignored in ingest, you can use --save-deid-logs PATH to save the log.

deid-log: ~/Documents/deid_log.csv

# Sets the filetype to DICOM

dicom:

  # Date-increment controls how many days to offset each date field
  # where the increment-date (shown below) is configured.
  #Positive values will result in later dates, negative
  # values will result in earlier dates.

  date-increment: -17

  # patient-age-from-birthdate sets the DICOM header as a 3-digit value with a suffix
  # be 091D, and that same age in months would be 003M. By default, if
  # the age fits in days, then days will be used,
  # otherwise if it fits in months, then months
  # will be used, otherwise years will be used

  patient-age-from-birthdate: true

  # Set patient age units as Years. Other options include months (M) and days (D)

  patient-age-units: Y

  # The following are field transformations.
  # Remove, replace-with, increment-date, hash, and hashuid can be used with any DICOM
  # field. Replace name with the DICOM field "keyword" by the DICOM standard
  fields:

    # Use remove Remove a dicom field Removes the field from the DICOM entirely.
    # If removal is not supported then this field will be blank.
    # This example removes PatientID.

    - name: PatientID
      remove: true

    # Replace a dicom field with the value provided.
    # This example replaces “StationName” with "XXXX" in Flywheel

    - name: StationName
      replace-with: XXXX

    # Offsets the date by the number of days defined in
    # the date-increment setting above, preserving the time
    # and timezone. In this example, StudyDate appears as 17 days earlier

    - name: StudyDate
      increment-date: true

    # One-Way hash a dicom field to a unique string

    - name: AccessionNumber
      hash: true

     # Replaces a UID field with a hashed version of that
     # field. The first four nodes (prefix) and last node
     # (suffix) will be preserved, with the middle being
     # replaced by the hashed value

    - name: ConcatenationUID
      hashuid: true

Step 3: Determine How Data Needs to be De-identified

You are responsible for ensuring the de-identified data is acceptable for your use cases and meets any regulatory or compliance requirements you have.

Tip

Tip: You can create different de-id profiles for each Flywheel project.

Learn more about how to apply de-id profiles to groups and projects.

One important piece to consider when determiningyour de-identification needs is that Flywheel uses certain DICOM tags to organize the images in groups, projects, subjects, sessions, and acquisitions during the import process.

The following are the default DICOM tags Flywheel uses to sort DICOM images. Your Flywheel site may use different tags for sorting data, so check with your institution's Flywheel admin for the specific sorting tags. We do not recommend removing the sorting tags altogether when de-identifying data. Instead, use one of the other transformation methods such as replace-with . This allows Flywheel to automatically group related DICOM images while still de-identifying data.

Keyword Tag Flywheel Field
Patient ID (0010,0020) Group ID
When uploading via Connector
Project Label
When uploading via Connector
Subject ID
Study Instance UID (0020,000D) Session UID
Study Description (0008,1030) Session Label
Series Instance UID (0020,000E) Acquisition UID
Series Description (0008, 103E) Acquisition Label

Step 3: Update your YAML file

Once you have determined how you want to de-identify your data: update the YAML file.

  1. Update the YAML file with the appropriate transformations. See the reference guide below for more information on transformations.
  2. Confirm that your profile is valid YAML. You can use an online tool like YAML Lint.

Example de-id profile: Create a keeplist

One option for removing many DICOM tags is to create a keeplist using remove-undefined. This means that any DICOM tags not noted under fields are removed. This example also shows a few basic transformations of the fields in the keeplist:

---
name: My Profile
description: version 1 of an example de-id profile

dicom:
     remove-undefined: true
     fields:
   - name: PatientID
     replace-with: "001"
   - name: PatientBirthdate
     jitter: true
     jitter-range: 10
   - name: PatientSex
   - name: PatientAge
   - name: StudyDate
     jitter: true
   - name: AcquisitionDate
   - name: EthnicGroup
   - name: SOPClassUID
   - name: SeriesDescription
   - name: StudyDescription

Example De-id Profile: Create a Blocklist

Let's say you have the following requirements for your data:

  • Remove a handful of fields that you know have PII and are not needed in Flywheel by listing the tags under fields and adding remove: true. This creates a blocklist.
  • Offset dates by a consistent number using date-increment.
  • You use the default sorting tags in your environment and want to replace the value of some fields with "REDACTED" or a hash.
  • Keep one private DICOM tag to use in Flywheel, but the rest can be removed with remove-private-tags

Here's one example of a de-id profile that would satisfy the above requirements:

---
name: My Profile
description: version 1 of an example de-id profile

dicom:
remove-private-tags: true
date-increment: 14
  fields:
    - name: PatientID
      replace-with: REDACTED
    - name: StudyInstanceUID
      hashuid: true
    - name: SeriesInstanceUID
      hashuid: true
    - name: SOPInstanceUID
      hashuid: true
    - name: PatientName
      remove: true
    - name: (0009, "GEMS_IDEN_01", 1004)
      keep: true
    - name: AccessionNumber
      remove: true
    - name: InstitutionName
      remove: true
    - name: InstitutionAddress
      remove: true
    - name: ReferringPhysicianName
      remove: true
    - name: ReferringPhysicianAddress
      remove: true
    - name: ReferringPhysicianTelephoneNumbers
      remove: true
    - name: InstitutionalDepartmentName
      remove: true
    - name: PhysiciansOfRecord
      remove: true
    - name: PerformingPhysicianName
      remove: true
    - name: NameOfPhysiciansReadingStudy
      remove: true
    - name: OperatorsName
      remove: true
    - name: AdmittingDiagnosesDescription
      remove: true
    - name: PatientBirthTime
      remove: true
    - name: PatientInsurancePlanCodeSequence
      remove: true
    - name: OtherPatientIDs
      remove: true
    - name: OtherPatientNames
      remove: true
    - name: OtherPatientIDsSequence
      remove: true
    - name: PatientBirthName
      remove: true
    - name: PatientAddress
      remove: true
    - name: PatientMotherBirthName
      remove: true
    - name: MilitaryRank
      remove: true
    - name: MedicalRecordLocator
      remove: true
    - name: PatientTelephoneNumbers
      remove: true
    - name: EthnicGroup
      remove: true
    - name: Occupation
      remove: true
    - name: AdditionalPatientHistory
      remove: true
    - name: ResponsiblePerson
      remove: true
    - name: PatientComments
      remove: true
    - name: ClinicalTrialSponsorName
      remove: true
    - name: ClinicalTrialProtocolID
      remove: true
    - name: ClinicalTrialProtocolName
      remove: true
    - name: ClinicalTrialSiteID
      remove: true
    - name: ClinicalTrialSiteName
      remove: true
    - name: ClinicalTrialSubjectID
      remove: true
    - name: ClinicalTrialTimePointID
      remove: true
    - name: ClinicalTrialTimePointDescription
      remove: true
    - name: ClinicalTrialCoordinatingCenterName
      remove: true
    - name: ProtocolName
      remove: true
    - name: ImageComments
      remove: true
    - name: StudyComments
      remove: true
    - name: RequestingPhysician
      remove: true
    - name: RequestAttributesSequence
      remove: true
    - name: NamesOfIntendedRecipientsOfResults
      remove: true
    - name: PersonIdentificationCodeSequence
      remove: true
    - name: PersonAddress
      remove: true
    - name: PersonTelephoneNumbers
      remove: true
    - name: VerifyingObserverName
      remove: true
    - name: PersonName
      remove: true
    - name: ContentSequence
      remove: true
    - name: ContentCreatorName
      remove: true
    - name: ReviewerName
      remove: true
    - name: OriginalAttributesSequence
      remove: true
    - name: StudyDescription
      remove: true
    - name: DerivationDescription
      remove: true
    - name: ClinicalTrialSeriesDescription
      remove: true
    - name: TherapyDescription
      remove: true
    - name: InterventionDescription
      remove: true
    - name: RequestedProcedureDescription
      remove: true
    - name: AcquisitionProtocolDescription
      remove: true
    - name: ScheduledStationAETitle
      remove: true
    - name: ScheduledPerformingPhysicianName
      remove: true
    - name: DeviceDescription
      remove: true
    - name: DischargeDiagnosisDescription
      remove: true
    - name: StationName
      remove: true
    - name: ScheduledStationName
      remove: true
    - name: PerformedStationAETitle
      remove: true
    - name: PerformedStationName
      remove: true
    - name: PerformedProcedureStepDescription
      remove: true
    - name: DeviceSerialNumber
      remove: true
    - name: PerformedProcedureStepID
      remove: true
    - name: ClinicalTrialSubjectReadingID
      remove: true
    - name: IssuerOfPatientID
      remove: true
    - name: DigitalSignaturesSequence
      remove: true
    - regex: .*IdentificationSequence.*
      remove: true
    - name: NameOfPhysiciansReadingStudy
      remove: true
    - name: FrameOfReferenceUID
      hashuid: true

Next steps

See our article to learn how to test your de-id profile locally before uploading sensitive data to Flywheel.

De-identification Profile Settings Reference

This section explains what each de-identification setting does as well as shows an example of how data is de-identified. These settings are broken into file and fields settings:

File Settings

These settings are applied to ALL DICOM data by default. They offer broad strokes of de-identification. You can override these for specific tags using [fields](#)

date-format (string)

Describes what format Flywheel should expect for dates in the metadata of the file. This enables Flywheel to properly parse the date. Only use if the date fields in your data have a format that is different than the DICOM default %Y%m%d

The format interpretation follows the format codes that the 1989 C standard requires.

Example

date-format: %m-%d

datetime-format (string)

Describes what format Flywheel should expect for dates in the metadata of the file. This enables Flywheel to properly parse the date. Only use if the datetime fields in your data have a format that is different than the DICOM default %Y%m%d%H%M%S.%f”

The format interpretation follows the format codes that the 1989 C standard requires.

Example

datetime-format: %H:%M.%S.%f

date-increment (numeric)

Controls how much time to offset each date or datetime field where the increment-date(true/false) or increment-datetime(true/false) transformation is chosen.

  • Positive values result in later dates
  • Negative values result in earlier dates
  • Incrementing by a multiple of 7 will keep the week-day consistent for shifted dates
  • Incrementing by a non-integer value will also modify the time of datetime element (For example, 0.5 will increment by 12h datetime).

Example

date-increment: -5
Tag Original metadata De-identified metadata
StudyDate(0008,0020) 20150215 20150210

jitter-range (numeric)

The range used to offset a value by a random number. The new value is in [-jitter-range, +jitter-range]. Use jitter-type to change the random number to an integer

Jitter-range can also be set at field level.

Default is jitter-range is 2

Example

- name: PatientWeight
  jitter: true
  jitter-range: 10
Tag Original metadata De-identified metadata
PatientWeight (0010,1030) 54.43 60

Jitter-type (int/float)

Draws a random number from a uniform distribution set by jitter-range. You can configure the random number to be an integer (int) or a floating-point number (float). Jitter-type can also used at the field level.

Default is float.

Example

dicom:  
  jitter-range: 10
  jitter-type: int
  fields:
     - name: PatientWeight
       jitter: true 
Tag Original metadata De-identified metadata
PatientWeight (0010,1030) 54.43 44.43

patient-age-from-birthdate (true/false)

When set totrue, this will set the PatientAge DICOM header as a 3-digit value with a suffix indicating units.

For example an age in days would be 091D, and that same age in months would be 003M. By default, the age will be set using a best-fit approach. This means if the age fits in days, then days will be used, otherwise if it fits in months, then months will be used, otherwise years will be used

Default is false.

Example

dicom:
   patient-age-from-birthdate: true
Tag Original metadata De-identified metadata
PatientAge(0010,1010) 97 097Y

patient-age-units (string)

When set in conjunction with patient-age-from-birthdate, this will act as a preference for which units to use. If the value does not fit into the desired unit, the next level of units will be used.

The most common use for this field would be to always use years as the patient age. Valid values are D, M, Y for Days, Months and Years respectively.

Example

dicom:
   patient-age-from-birthdate: true
   patient-age-units: Y
Tag Original metadata De-identified metadata
PatientAge(0010,1010) 97 097Y

remove-private-tags (true/false)

When set to true, the private DICOM tags will be removed. Private DICOM tags are any tags not included in the standard DICOM data elements

Default is false.

Example

dicom:
   remove-private-tags: true
Tag Original metadata De-identified metadata
MyPrivateTag(1235,0042) Acme Inc blank
This field and value are not added to Flywheel metadata

recurse-sequence (true/false)

When set to true, each element of a sequence (VR=SQ) will be processed according to the profile, recursively for all nested sequence elements. When used withremove-undefined: true setting, any sequences or sequence elements defined under fields will result in full sequence having de-id profile applied

Default is false (false means only the top-level field for the sequence is processed by the de-id profile)

Note

When using this option, the profile fields section must not define fields acting on element of sequences or using regex.

Example

dicom:
  fields:
    - name: RequestedProcedureCodeSequence
      recurse-sequence: true
      replace-with: XXX

Let's say you have the RequestedProcedureCodeSequence (0032,1064), when recurse-sequence is set to true, all items within that sequence are also de-identified according your to your profile. You do not need to call out each tag individually.

Tag Original metadata De-identified metadata
RequestedProcedureCodeSequence (0032,1064) MRKNEELT\RP\MRI KNEE LEFT, W/O CONTRAST XXX
CodeValue (0008,0100) MRKNEELT XXX
CodingSchemeDesignator (0008,0102) RP XXX
CodeMeaning (0008,0104) MRI KNEE LEFT, W/O CONTRAST XXX

remove-undefined (true/false)

This is also called a "keeplist". When set to true, all data elements not defined in thefields section of the profile will be removed. If any field references a nested element in a sequence the whole sequence element will be kept. Default is false.

Warning

When using this option, particular attention should be paid to the de-id profile to guarantee that the output DICOM still contains the mandatory data elements according to its Information Object Definitions (IOD)

Example

dicom:
  remove-private-tags: true
  remove-undefined: true
  fields:
    - name: PatientID
      replace-with: MY_PATIENT_ID
    - name: StudyInstanceUID
      hashuid: true
    - name: PatientName
      replace-with: REDACTED
Tag Original metadata De-identified metadata
PatientID 0345 MY_PATIENT_ID
StudyInstanceUID 1.2.840.113619.6.283.4.983142589.7316.1300473420.841 1.2.840.113619.551726.420312.177022.222461.230571.501817.841
PatientName Smith John REDACTED
AcquisitionNumber 1 Blank
This is not defined under fields, so it is removed
StudyID 4912 Blank
This is not defined under fields, so it is removed

replace-with-insert (true/false)

If true, replace-with actions will insert the field inside record if it does not exist already and replace its value. If false, replace-with will not insert the field if it exists in the record already.

Default is true.

Example

replace-with-insert: false
fields:
    - name: StudyTime
      replace-with: REDACTED
Tag Original metadata De-identified metadata
StudyTime(0008,0030) blank (no data) blank (no data)
Without the replace-with-insert set to false, Flywheel would add REDACTED instead of leaving blank.

Fields

The fields portion of the de-id profile allows you to reference a specific DICOM data element and perform transformations on it. These follow the format:

dicom:
  fields:
    - name: <DICOM data element>
    <field transformation>: <value>
    <field transformation>: <value>

In this section you will find the different ways you can reference a specific DICOM field, tag, or keyword.

How to Reference a DICOM Data Element

This file profile supports 3 ways to reference a DICOM data element: keyword, tag, or dotty-notation.

Note

The data elements in the DICOM File Meta information located in the optional 128 bytes of the DICOM File Preamble can be accessed in the same way as other tags.

Keyword

The keyword string as defined in the public DICOM dictionaries. For example: PatientName, SOPClassUID, and AcquisitionDate

fields:
   - name: PatientName
     replace-with: REDACTED

Tag

When referencing the DICOM tag, you can use any of these notations:

Tuple

Format

  • (gggg, eeee)

Example

# straight forward dicom tag with or w/o spaces
- name: (0010, 0010)

# dicom tag with no punctuation 
#no spaces must be in quotes 
- name: '00100'

Hexadecimal

Format

  • 'ggggeeee'
  • '0xggggeeee'

Example

# dicom tag with no spaces
- name: '00100010'

# dicom tag as hexademical format
- name: '0x00100010'

Private Tag Notation

We rely on a the predefined private dictionaries pydicom _private_dict.py and flywheel-metadata to infer tag VR.

Format

(gggg, PrivateCreatorName, ee)

Example

- name: (0009, "GEMS_IDEN_01", 04)

The private tag creator element is used to validate organization of private tags. This means that if you used the above example in a template the tag (0009,"NOT_GEMS_IDEN_01", 04) would be ignored because the creator value does not match.

See the official DICOM documentation for more information on private tags.

Repeater Group Notation

You can apply a field transformation to a range of DICOM elements or groups. Available for groups in range (50XX,eeee) and (60XX,eeee) only.

Format

  • Hexadecimal: 50XXeeee, 0x50XXeeee, 60XXeeeee,0x60XXeeee
  • Tuple: (50XX, eeee), (60XX, eeee)

Example

# repeating group hexadecimal example
- name: '0x60XX0022'

# repeating group no punctuation no spaces example
- name: '60XX0040'

# using XX to reference all elements in a range
- name: 0x50XX1001
  replace-with: REDACTED
- name: (50XX, 1001)
  remove: true
- name: (60XX, 0050)
  remove: true

See DICOM's documentation for more information on repeating groups.

Dotty-notation

Dot separated notation for referencing element within DICOM sequence. You can use a mix of keywords and tags. For example:

  • AnatomicRegionSequence.0.CodeValue
  • 00082218.0.00080102
  • AnatomicRegionSequence.0.00080104

Use * to reference all indices of the sequence element at once:

AnatomicRegionSequence.*.CodeValue.

The notation also supports referring data element at any depth recursively.

# nested tags (*) meaning 'for all indexes in sequence'
- name: ReferencedImageSequence.*.ReferencedSOPClassUID 
# You could also write this as (0008, 1140).*.(0008, 1150)

# nested tag with specific index in sequence
- name: ReferencedPerformedProcedureStepSequence.1.ReferencedSOPInstanceUID 
# You could also write this as (0008, 1111).1.(0008, 1155)

Field transformations

Once you reference a DICOM field, you can apply a field transformation to it. These field transformation override any file transformations set above.

hash(true/false)

Replace the contents of the field with a one-way cryptographic hash in hexadecimal form. Only the first 16 characters of the hash will be used, in order to support short strings.

- name: AccessionNumber
  hash: true

hashuid(true/false)

Replaces a UID field with a hashed version of that field. By default, the first four nodes (prefix) and last node (suffix) will be preserved, with the middle being replaced by the hashed value.

Either of those can be applied at the field level or at the global level.

- name: AccessionNumber
  hashuid: true
Tag Original metadata De-identified metadata
AccessionNumber 1.2.840.113619.6.283.4.983142589.7316.1300473420.841 1.2.840.113619.551726.420312.177022.222461.230571.501817.841

Increment-date(true/false)

Offsets the date by the number of days defined in the date-increment setting, preserving the time and timezone. If the date format does not match the DICOM default: %Y%m%d , tell Flywheel what format to expect by using the date-format setting (the date-format setting helps Flywheel parse the string.)

You can apply either of these settings at the global level as well as at the field level.

- name: StudyDate
  increment-date: true
  date-format: "%Y-%m-%d"

Warning

You are responsible for setting a date-format which is valid for the file type being processed.

Increment-datetime (true/fasle)

Offsets the date by the number of days defined in the date-increment setting of the file profile, preserving the time and timezone. If the format of the datetime fields in your data does not match the DICOM default: %Y%m%d%H%M%S.%f, tell Flywheel what format to expect by using the datetime-format setting (the datetime-format helps Flywheel parse the string.)

You can apply either of these settings at the global level as well as at the field level.

- name: AcquisitionDateTime 
  increment-datetime: true
  datetime-format: "%Y-%m-%d %H:%M:%S"

Warning

You are responsible for setting a date-format which is valid for the file type being processed.

jitter(true/fasle)

Offsets a numeric (integer or float) value by a random value drawn from a uniform distribution centered on 0. By default, Flywheel uses integers and the range [-1,1]. You can change this range with jitter-range. You can also set if the numeric value is an integer or a floating-point number with jitter-type. These can also be set at the field level or the global level.

- name: PatientWeight
  jitter: true
  jitter-range: 10
  jitter-type: float

Additional Considerations

Additional DICOM constraints apply for DICOM data element based on VR:

  • if jitter-type:float and VR is ["IS", "UL", "US"], then we will convert the jittered value to an integer
  • If VR is ["UL", "US"] and the jittered_value<0, then we convert the jittered_value = 0. (unsigned short, unsigned long)
  • if VR is “US” and jitter_value> 65535, then new_value = 65535

keep(true/false)

Used when creating a keeplist using remove-undefined:true or to override global or file de-identification settings.

- name: SeriesDescription
  keep: true

Note

If only name is defined as key in the field configuration, Flywheel defaults to the keep: true.

replace-with(string)

Replaces the contents of the field with the value provided. Please be aware of the the length of the field being replaced because some DICOM fields only support a limited number of characters. By default, the field will be created in the Flywheel record if it does not exist. This behavior can be reversed by setting replace-with-insert: False at the profile or the field.

- name: PatientID
  replace-with: REDACTED
  replace-with-insert: False

Read more

For further information about action, action configuration and file profile specific configuration, please refer to our open source python library documentation.