Skip to content

Step 2: De-identify Data

Introduction

To avoid exposing PHI, consider how you may need to de-identify your data before uploading it to Flywheel. In Flywheel, data is de-identified using a de-id profile.

Instruction Steps

What is the de-id profile?

A de-id profile is a set of instructions for what to do with file metadata that may include PHI.

The de-id profile options apply de-identification transformations in 2 different ways:

Note

You can skip this portion of the config file if you are uploading to a location that already de-identifies data using a site, group, or project de-id profile or if you do not wish to de-identify your data.

Add a De-id Profile Section to the Config File

  1. Below the template section add a new section for the de-id profile. Below is an example config file with template and de-id profile sections :

    #####
    # Template Settings
    #####
    
    template:
      - pattern: "{subject}"
      - pattern: "{session}"
      - pattern: "{acquisition}"
        scan: dicom
    
    ####
    # De-identification settings
    ####
    name: Profile1
    dicom:
    
      # Date-increment controls how many days to offset each date field
      # where the increment-date (shown below) is configured.
      #Positive values will result in later dates, negative
      # values will result in earlier dates.
    
      date-increment: -17
    
      # patient-age-from-birthdate sets the DICOM header as a 3-digit value with a suffix
      # be 091D, and that same age in months would be 003M. By default, if
      # the age fits in days, then days will be used,
      # otherwise if it fits in months, then months
      # will be used, otherwise years will be used
    
      patient-age-from-birthdate: true
    
      # all data elements not defined in fields section of the profile will be removed. 
      # If any field references a nested element in a sequence the whole sequence element 
      # will be kept.
    
      remove-undefined: true
    
      # Set patient age units as Years. Other options include months (M) and days (D)
    
      patient-age-units: Y
    
      # The following are field transformations.
      # Remove, replace-with, increment-date, hash, and hashuid can be used with any DICOM
      # field. Replace name with the DICOM field "keyword" by the DICOM standard
      fields:
    
        # Use remove Remove a dicom field Removes the field from the DICOM entirely.
        # If removal is not supported then this field will be blank.
        # This example removes PatientID.
    
        - name: PatientID
          replace-with: REDACTED
    
        # Replace a dicom field with the value provided.
        # This example replaces “StationName” with "XXXX" in Flywheel
    
        - name: StationName
          replace-with: XXXX
    
        # Offsets the date by the number of days defined in
        # the date-increment setting above, preserving the time
        # and timezone. In this example, StudyDate appears as 17 days earlier
    
        - name: StudyDate
          increment-date: true
    
        # You can refer to fields by their DICOM tag or keyword
        # Applies one-way hash to a unique string
    
        - name: (0008,0050)
          hash: true
    
        # Replaces a UID field with a hashed version of that
        # field. The first four nodes (prefix) and last node
        # (suffix) will be preserved, with the middle being
        # replaced by the hashed value
    
        - name: ConcatenationUID
          hashuid: true
    
        # The fields below are listed so that they are not removed as part of the 
        # remove-undefined setting above. 
        - name: SeriesInstanceUID
        - name: Modality
        - name: SeriesNumber
        - name: ScheduledProcedureStepID
        - name: RequestedProcedureID
        - name: StudyTime
        - name: StudyID
        - name: SeriesNumber
        - name: PatientID
        - name: StudyInstanceUID
        - name: ProtocolName
        - name: AcquisitionDate
          increment-date: true
        - name: AcquisitionDateTime
        - name: AcquisitionTime
        - name: SeriesDate
          increment-date: true
        - name: SeriesTime
    
  2. Update the template to fit your dataset by adding or removing fields or updating the transformation options.

Warning

Flywheel requires the following DICOM data elements to sort and label data when uploaded via the CLI. Make sure that your de-id template does not remove these fields. However, you can transform them by incrementing or replacing the value.

The OHIF Viewer

The OHIF viewer requires the following tags:

Keyword Tag VR
SeriesInstanceUID (0020,000E) UI
StudyInstanceUID (0020,000D) UI
Modality (0008,0060) CS
SeriesNumber (0020,0011) IS
ScheduledProcedureStepID (0040,0009) SH
RequestedProcedureID (0040,1001) SH
StudyDate (0008,0020) DA
StudyTime (0008,0030) DM
StudyID (0020,0010) SH
SeriesNumber (0020,0011 IS
PatientID (0010,0020) LO

Flywheel Hierarchy Field Mappings to DICOM Tags

Keyword Tag Flywheel Field VR
Study Comments* (0032,4000)* Group ID
Project Label
Subject ID
LT
Study Instance UID (0020,000D) Session UID UI
Study Description (0008,1030) Session Label LO
Series Instance UID (0020,000E) Acquisition UID UI
Protocol Name (0018,1030) Acquisition Label LO

*It is possible to customize which DICOM field(s) Flywheel should configure to capture Routing String values (Group, Project, and Subject)

Flywheel Field Mappings to DICOM Tags

If these values are not present in the DICOM tags, then the Flywheel timestamp will be blank

Keyword Tag Flywheel Field VR
Acquisition Date (0008,0022) Timestamp DA
Acquisition DateTime (0008,002A) Timestamp DT
Acquisition Time (0008,0032) Timestamp TM
If one set of DICOM tag values is not present then we try to use the other
Series Date (0008,0021) Timestamp DA
Series Time (0008,0031) Timestamp TM

Click Next to learn how to include or exclude folders and file.

De-identification Profile Settings Reference

This section explains what each de-identification setting does as well as shows an example of how data is de-identified. These settings are broken into file and field settings:

File Settings

These settings are applied to ALL DICOM data by default. They offer broad strokes of de-identification. You can override these for specific tags using field settings

date-format (string)

Describes what format Flywheel should expect for dates in the metadata of the file. This enables Flywheel to properly parse the date. Only use if the date fields in your data have a format that is different than the DICOM default %Y%m%d

The format interpretation follows the format codes that the 1989 C standard requires.

Example 1

date-format: %m-%d

datetime-format (string)

Describes what format Flywheel should expect for dates in the metadata of the file. This enables Flywheel to properly parse the date. Only use if the datetime fields in your data have a format that is different than the DICOM default %Y%m%d%H%M%S.%f”

The format interpretation follows the format codes that the 1989 C standard requires.

Example 2

datetime-format: %H:%M.%S.%f

date-increment (numeric)

Controls how much time to offset each date or datetime field where the increment-date(true/false) or increment-datetime(true/false) transformation is chosen.

  • Positive values result in later dates
  • Negative values result in earlier dates
  • Incrementing by a multiple of 7 will keep the week-day consistent for shifted dates
  • Incrementing by a non-integer value will also modify the time of datetime element (For example, 0.5 will increment by 12h datetime).

Example 3

date-increment: -5
Tag Original metadata De-identified metadata
StudyDate(0008,0020) 20150215 20150210

jitter-range (numeric)

The range used to offset a value by a random number. The new value is in [-jitter-range, +jitter-range]. Use jitter-type to change the random number to an integer

Jitter-range can also be set at field level.

Default is jitter-range is 2

Example 4

- name: PatientWeight
  jitter: true
  jitter-range: 10
Tag Original metadata De-identified metadata
PatientWeight (0010,1030) 54.43 60

Jitter-type (int/float)

Draws a random number from a uniform distribution set by jitter-range. You can configure the random number to be an integer (int) or a floating-point number (float). Jitter-type can also used at the field level.

Default is float.

Example 5

dicom:  
  jitter-range: 10
  jitter-type: int
  fields:
     - name: PatientWeight
       jitter: true 
Tag Original metadata De-identified metadata
PatientWeight (0010,1030) 54.43 44.43

patient-age-from-birthdate (true/false)

When set totrue, this will set the PatientAge DICOM header as a 3-digit value with a suffix indicating units.

For example an age in days would be 091D, and that same age in months would be 003M. By default, the age will be set using a best-fit approach. This means if the age fits in days, then days will be used, otherwise if it fits in months, then months will be used, otherwise years will be used

Default is false.

Example 6

dicom
   patient-age-from-birthdate: true
Tag Original metadata De-identified metadata
PatientAge(0010,1010) 97 097Y

patient-age-units (string)

When set in conjunction with patient-age-from-birthdate, this will act as a preference for which units to use. If the value does not fit into the desired unit, the next level of units will be used.

The most common use for this field would be to always use years as the patient age. Valid values are D, M, Y for Days, Months and Years respectively.

Example 7

dicom
   patient-age-from-birthdate: true
   patient-age-units: Y
Tag Original metadata De-identified metadata
PatientAge(0010,1010) 97 097Y

remove-private-tags (true/false)

When set to true, the private DICOM tags will be removed. Private DICOM tags are any tags not included in the standard DICOM data elements

Default is false.

Example 8

dicom
   remove-private-tags: true
Tag Original metadata De-identified metadata
MyPrivateTag(1235,0042) Acme Inc blank
This field and value are not added to Flywheel metadata

recurse-sequence (true/false)

When set to true, each element of a sequence (VR=SQ) will be processed according to the profile, recursively for all nested sequence elements. When used withremove-undefined: true setting, any sequences or sequence elements defined under fields will result in full sequence having de-id profile applied

Default is false (false means only the top-level field for the sequence is processed by the de-id profile)

Note

When using this option, the profile fields section must not define fields acting on element of sequences or using regex.

Example 9

dicom:
  recurse-sequence: true
  fields:
    - name: AccessionNumber
      replace-with: ''

When recurse-sequence is set to true, AccessionNumber within that all DICOM Sequences are also de-identified according to the profile. You do not need to call out each tag individually.

remove-undefined (true/false)

This is also called a "keeplist". When set to true, all data elements not defined in thefields section of the profile will be removed. If any field references a nested element in a sequence the whole sequence element will be kept. Default is false.

Warning

When using this option, particular attention should be paid to the de-id profile to guarantee that the output DICOM still contains the mandatory data elements according to its Information Object Definitions (IOD)

Example 10

dicom:
  remove-private-tags: true
  remove-undefined: true
  fields:
    - name: PatientID
      replace-with: MY_PATIENT_ID
    - name: StudyInstanceUID
      hashuid: true
    - name: PatientName
      replace-with: REDACTED
Tag Original metadata De-identified metadata
PatientID 0345 MY_PATIENT_ID
StudyInstanceUID 1.2.840.113619.6.283.4.983142589.7316.1300473420.841 1.2.840.113619.551726.420312.177022.222461.230571.501817.841
PatientName Smith John REDACTED
AcquisitionNumber 1 Blank
This is not defined under fields, so it is removed
StudyID 4912 Blank
This is not defined under fields, so it is removed

replace-with-insert (true/false)

If true, replace-with actions will insert the field inside record if it does not exist already and replace its value. If false, replace-with will not insert the field if it exists in the record already.

Default is true.

Example 11

replace-with-insert: false
fields:
    - name: StudyTime
      replace-with: REDACTED
Tag Original metadata De-identified metadata
StudyTime(0008,0030) blank (no data) blank (no data)
Without the replace-with-insert set to false, Flywheel would add REDACTED instead of leaving blank.

Fields

The fields portion of the de-id profile allows you to reference a specific DICOM data element and perform transformations on it. These follow the format:

dicom:
  fields:
    - name: <DICOM data element>
    <field transformation>: <value>
    <field transformation>: <value>

In this section you will find the different ways you can reference a specific DICOM field, tag, or keyword.

How to Reference a DICOM Data Element

This file profile supports 3 ways to reference a DICOM data element: keyword, tag, or dotty-notation.

Note

The data elements in the DICOM File Meta information located in the optional 128 bytes of the DICOM File Preamble can be accessed in the same way as other tags.

Keyword

The keyword string as defined in the public DICOM dictionaries . For example: PatientName, SOPClassUID, and AcquisitionDate

fields:
   - name: PatientName
     replace-with: REDACTED

Tag

When referencing the DICOM tag, you any of these notations:

Notation Format Example
Tuple (0010, 0010)
 
# straight forward dicom tag with or w/o spaces
- name: (0002, 0010)

# dicom tag with no punctuation
#no spaces must be in quotes
- name: '00100
Hexadecimal * '00100010' ggggeeee
* '0x00100010' 0xggggeeee
 
# dicom tag with no spaces
- name: '00100021'

# dicom tag as hexadecimal format
- name: '0x00100022'
Private tag notation (GGGG, PrivateCreatorName, EE)
 
- name: (0009, "GEMS_IDEN_01", 04)
The private tag creator element is used to validate organization of private tags in the above example, (0009,"NOT_GEMS_IDEN_01", 04) would be ignored because the creator value must match. See the official DICOM documentation for more information on private tags.
We rely on a predefined private dictionaries to infer tag VR which is build from pydicom _private_dict.py and flywheel-metadata.
Repeater group notation

You can apply a field transformation to a range of DICOM elements or groups. Available for groups in range (50XX, EEEE) and (60XX, EEEE) only
Hexadecimal: 50XX####``0x50XX####``60XX####``0x60XX####
Tuple: (50XX,####)``(60XX,####)

# repeating group hexadecimal example
- name: '0x60XX0022'

# repeating group no punctuation no spaces example
- name: '60XX0040'

# repeating group range example (old format)
- name: (6000-60FF, 0010)

# using XX to reference all elements in a range
- name: 0x50XX1001
replace-with: REDACTED
- name: (50XX,1001)
remove: true
- name: (60XX, 0050)
remove: true


See DICOM's documentation for more information on repeating groups.

Dotty-notation

Notation for referencing an element within DICOM sequence. You can use a mix of keywords and tags.

For example: AnatomicRegionSequence.0.CodeValue, 00082218.0.00080102, AnatomicRegionSequence.0.00080104

In addition, the dotty-notation supports the use * to reference all indices of the sequence element at once.

AnatomicRegionSequence.*.CodeValue.

The notation supports referring data element at any depth recursively.

# nested tags (*) meaning 'for all indexes in sequence'
- name: ReferencedImageSequence.*.ReferencedSOPClassUID 
# (0008, 1140).*.(0008, 1150)

# nested tag with specific index in sequence
- name: ReferencedPerformedProcedureStepSequence.1.ReferencedSOPInstanceUID 
# (0008, 1111).1.(0008, 1155)

Field Transformations

Once you reference a DICOM field, you can apply a field transformation to it. These field transformation override any file transformations set above.

hash(true/false)

Replace the contents of the field with a one-way cryptographic hash in hexadecimal form. Only the first 16 characters of the hash will be used, in order to support short strings.

- name: AccessionNumber
  hash: true

hashuid (true/false)

Replaces a UID field with a hashed version of that field. By default, the first four nodes (prefix) and last node (suffix) will be preserved, with the middle being replaced by the hashed value.

Either of those can be applied at the field level or at the global level.

- name: AccessionNumber
  hashuid: true
Tag Original metadata De-identified metadata
AccessionNumber 1.2.840.113619.6.283.4.983142589.7316.1300473420.841 1.2.840.113619.551726.420312.177022.222461.230571.501817.841

increment-date (true/false)

Offsets the date by the number of days defined in the date-increment setting, preserving the time and timezone. If the date format does not match the DICOM default: %Y%m%d , tell Flywheel what format to expect by using the date-format setting (the date-format setting helps Flywheel parse the string.)

You can apply either of these settings at the global level as well as at the field level.

- name: StudyDate
  increment-date: true
  date-format: "%Y-%m-%d"

Warning

You are responsible for setting a date-format which is valid for the file type being processed.

Increment-datetime (true/false)

Offsets the date by the number of days defined in the date-increment setting of the file profile, preserving the time and timezone. If the format of the datetime fields in your data does not match the DICOM default: %Y%m%d%H%M%S.%f, tell Flywheel what format to expect by using the datetime-format setting (the datetime-format helps Flywheel parse the string.)

You can apply either of these settings at the global level as well as at the field level.

- name: AcquisitionDateTime 
  increment-datetime: true
  datetime-format: "%Y-%m-%d %H:%M:%S"

Warning

You are responsible for setting a date-format which is valid for the file type being processed.

jitter(true/false)

Offsets a numeric (integer or float) value by a random value drawn from a uniform distribution centered on 0. By default, Flywheel uses integers and the range [-1,1]. You can change this range with jitter-range. You can also set if the numeric value is an integer or a floating-point number with jitter-type. These can also be set at the field level or the global level.

- name: PatientWeight
  jitter: true
  jitter-range: 10
  jitter-type: float

Additional Considerations

Additional DICOM constraints apply for DICOM data element based on VR:

  • if jitter-type:float and VR is ["IS", "UL", "US"], then we will convert the jittered value to an integer
  • If VR is ["UL", "US"] and the jittered_value<0, then we convert the jittered_value = 0. (unsigned short, unsigned long)
  • if VR is “US” and jitter_value> 65535, then new_value = 65535

keep(true/false)

Used when creating a keeplist using remove-undefined:true or to override global or file de-identification settings.

- name: SeriesDescription
  keep: true

Note

If only name is defined as key in the field configuration, Flywheel defaults to the keep: true.

replace-with(string)

Replaces the contents of the field with the value provided. Please be aware of the the length of the field being replaced because some DICOM fields only support a limited number of characters). By default, the field will be created in the Flywheel record if it does not exist. This behavior can be reversed by setting replace-with-insert: False at the profile or the field.

- name: PatientID
  replace-with: REDACTED
  replace-with-insert: False