Skip to content

De-id Profile Transformation Reference Guide

Introduction

This guide details the settings available when creating a de-identification profile. - Learn how to create a de-id profile - Learn how to enable a de-id profile for a project, group, or site.

Instruction Steps

De-identification Profile Settings Reference

This section explains what each de-identification setting does, with examples. These settings are broken into file and field settings.

File Settings

These settings are applied to ALL DICOM data by default. They offer broad strokes of de-identification. You can override these for specific tags using fields.

date-format (string)

Describes the format Flywheel should expect for dates in the metadata of the file. This enables Flywheel to properly parse the date. Only use if the date fields in your data have a format that is different than the DICOM default %Y%m%d

The format interpretation follows the format codes required by the 1989 C standard.

Example:

date-format: %m-%d

datetime-format (string)

Describes the format Flywheel should expect for datetimes in the metadata of the file. This enables Flywheel to properly parse the datetime. Only use if the datetime fields in your data have a format that is different than the DICOM default %Y%m%d%H%M%S.%f”

The format interpretation follows the format codes required by the 1989 C standard.

Example:

datetime-format: %H:%M.%S.%f

date-increment (numeric)

Controls how much time to offset each date or datetime field where the increment-date(true/false) or increment-datetime(true/false) transformation is chosen.

  • Positive values result in later dates
  • Negative values result in earlier dates
  • Incrementing by a multiple of 7 will keep the week-day consistent for shifted dates
  • Incrementing by a non-integer value will also modify the time of datetime element (For example, 0.5 will increment by 12h datetime).

Example:

date-increment: -5
Tag Original metadata De-identified metadata
StudyDate(0008,0020) 20150215 20150210

jitter-range (numeric)

The range used to offset a value by a random number. The new value is in [-jitter-range, +jitter-range]. Use jitter-type to change the random number to an integer.

Jitter-range can also be set at field level.

Default jitter-range is 2.

Example:

* name: PatientWeight
  jitter: true
  jitter-range: 10
Tag Original metadata De-identified metadata
PatientWeight (0010,1030) 54.43 60

Jitter-type (int/float)

Draws a random number from a uniform distribution set by jitter-range. You can configure the random number to be an integer (int) or a floating-point number (float). Jitter-type can also used at the field level.

Default is float.

Example:

dicom:  
  jitter-range: 10
  jitter-type: int
  fields:
     - name: PatientWeight
       jitter: true
Tag Original metadata De-identified metadata
PatientWeight (0010,1030) 54.43 44.43

patient-age-from-birthdate (true/false)

When set totrue, this will set the PatientAge DICOM header as a 3-digit value with a suffix indicating units.

For example, an age in days would be 091D, and that same age in months would be 003M. By default, the age will be set using a best-fit approach. This means if the age fits in days, then days will be used, otherwise if it fits in months, then months will be used, otherwise years will be used.

Default is false.

Example:

dicom:
   patient-age-from-birthdate: true
Tag Original metadata De-identified metadata
PatientAge(0010,1010) 97 097Y

patient-age-units (string)

When set in conjunction with patient-age-from-birthdate, this sets the preferred units. If the value does not fit into the desired unit, the next level of units will be used.

The most common use for this field is to always use years as the patient age. Valid values are D, M, Y for Days, Months and Years respectively.

Example:

dicom:
   patient-age-from-birthdate: true
   patient-age-units: Y
Tag Original metadata De-identified metadata
PatientAge(0010,1010) 97 097Y

remove-private-tags (true/false)

When set to true, private DICOM tags will be removed. Private DICOM tags are any tags not included in the standard DICOM data elements.

Default is false.

Example:

dicom:
   remove-private-tags: true
Tag Original metadata De-identified metadata
MyPrivateTag(1235,0042) Acme Inc null
This field and value are not added to Flywheel metadata

recurse-sequence (true/false)

When set to true, each element of a sequence (VR=SQ) will be processed according to the profile, recursively for all nested sequence elements. When used withremove-undefined: true, any sequences or sequence elements defined under fields will result in the full sequence having the de-id profile applied.

Default is false (only the top-level field for the sequence is processed by the de-id profile).

Note: When using this option, the profile fields section must not define fields acting on elements of sequences or using regex.

Example:

dicom:
  fields:
    - name: RequestedProcedureCodeSequence
      recurse-sequence: true
      replace-with: XXX

In the sequence RequestedProcedureCodeSequence (0032,1064), when recurse-sequenceis set to true, all items within that sequence are also de-identified according to the profile. You do not need to call out each tag individually.

Tag Original metadata De-identified metadata
RequestedProcedureCodeSequence (0032,1064) MRKNEELT\RP\MRI KNEE LEFT, W/O CONTRAST XXX
CodeValue (0008,0100) MRKNEELT XXX
CodingSchemeDesignator (0008,0102) RP XXX
CodeMeaning (0008,0104) MRI KNEE LEFT, W/O CONTRAST XXX

remove-undefined (true/false)

This is also called a "keeplist". When set to true, all data elements not defined in the fields section of the profile will be removed. If any field references a nested element in a sequence, the whole sequence element will be kept. Default is false.

Warning: When using this option, particular attention should be paid to the de-id profile to guarantee that the output DICOM still contains the mandatory data elements according to its Information Object Definitions (IOD)

Example:

dicom:
  remove-private-tags: true
  remove-undefined: true
  fields:
    - name: PatientID
      replace-with: MY_PATIENT_ID
    - name: StudyInstanceUID
      hashuid: true
    - name: PatientName
      replace-with: REDACTED
Tag Original metadata De-identified metadata
PatientID 0345 MY_PATIENT_ID
StudyInstanceUID 1.2.840.113619.6.283.4.983142589.7316.1300473420.841 1.2.840.113619.551726.420312.177022.222461.230571.501817.841
PatientName Smith John REDACTED
AcquisitionNumber 1 null
This is not defined under fields, so it is removed
StudyID 4912 null
This is not defined under fields, so it is removed

replace-with-insert (true/false)

If true, replace-with actions will insert the field inside the record if it does not exist already and replace its value. If false, replace-with will not insert the field if it exists in the record already.

Default is true.

Example:

replace-with-insert: false
fields:
    - name: StudyTime
      replace-with: REDACTED
Tag Original metadata De-identified metadata
StudyTime(0008,0030) null (no data) null (no data)

Without the replace-with-insert set to false, Flywheel would add REDACTED instead of leaving the value blank.

Field Settings

The fields portion of the de-id profile allows you to reference a specific DICOM data element and perform transformations on it. These use the following format:

dicom:
  fields:
    - name: <DICOM data element>
    <field transformation>: <value>
    <field transformation>: <value>

How to reference a DICOM data element

This file profile supports 3 ways to reference a DICOM data element: keyword, tag, or dotty notation.

Note: The data elements in the DICOM File Meta information located in the optional 128 bytes of the DICOM File Preamble can be accessed in the same way as other tags.

  • Keyword: The keyword string as defined in the public DICOM dictionaries. For example: PatientName, SOPClassUID, and AcquisitionDate.
fields:

* name: PatientName
    replace-with: REDACTED
  • Tag: When referencing the DICOM tag, you can use any of these notations:

    • Tuple

      • Format: (gggg, eeee)

      ### Straightforward Dicom Tag with or w/o Spaces
      
      - name: (0010, 0010)
      
      ### Dicom Tag with No Punctuation
      
      # no spaces must be in quotes
      
      - name: '00100'
      
      - Hexadecimal - Format: 'ggggeeee' , '0xggggeeee'

    ### Dicom Tag with No Spaces
    
    - name: '00100010'
    
    ### Dicom Tag as Hexademical Format
    
    - name: '0x00100010'
    
    - Private Tag Notation: We rely on a the predefined private dictionaries pydicom _private_dict.py and fw-file to infer tag VR. - Format: (gggg, PrivateCreatorName, ee)

    - name: (0009, "GEMS_IDEN_01", 04)
    

    The private tag creator element is used to validate organization of private tags. This means that if you used the above example in a template the tag (0009,"NOT_GEMS_IDEN_01", 04) would be ignored because the creator value does not match.

    See the official DICOM documentation for more information on private tags. - Repeater Group Notation: You can apply a field transformation to a range of DICOM elements or groups. Available for groups in range (50XX,eeee) and (60XX,eeee) only.

    • Hexadecimal Format: 50XXeeee, 0x50XXeeee, 60XXeeeee,0x60XXeeee
    • Tuple Format: (50XX, eeee), (60XX, eeee)
    ### repeating group hexadecimal example
    
    - name: '0x60XX0022'
    
    ### repeating group no punctuation no spaces example
    
    - name: '60XX0040'
    
    ### using XX to reference all elements in a range
    
    - name: 0x50XX1001
      replace-with: REDACTED
    - name: (50XX, 1001)
      remove: true
    - name: (60XX, 0050)
      remove: true
    

    See the DICOM documentation for more information on repeating groups.

  • Dotty notation: Dot-separated notation for referencing element within DICOM sequence. You can use a mix of keywords and tags. For example:

  • AnatomicRegionSequence.0.CodeValue

  • 00082218.0.00080102
  • AnatomicRegionSequence.0.00080104

Use * to reference all indices of the sequence element at once. For example: - AnatomicRegionSequence.*.CodeValue.

The notation also supports referring data element at any depth recursively.

### nested tags (*) meaning 'for all indexes in sequence'

- name: ReferencedImageSequence.*.ReferencedSOPClassUID

### You could also write this as (0008, 1140).*.(0008, 1150)

### nested tag with specific index in sequence

- name: ReferencedPerformedProcedureStepSequence.1.ReferencedSOPInstanceUID

### You could also write this as (0008, 1111).1.(0008, 1155)

Field Transformations

Once you reference a DICOM field, you can apply a field transformation to it. These field transformation override any file transformations set above.

hash (true/false)

Replace the contents of the field with a one-way cryptographic hash in hexadecimal form. Only the first 16 characters of the hash will be used, in order to support short strings. Can be applied at the field level or at the global level.

Example:

- name: AccessionNumber
  hash: true

hashuid (true/false)

Replaces a UID field with a hashed version of that field. By default, the first four nodes (prefix) and last node (suffix) will be preserved, with the middle being replaced by the hashed value.

Can be applied at the field level or at the global level.

Example:

* name: AccessionNumber
  hashuid: true
Tag Original metadata De-identified metadata
AccessionNumber 1.2.840.113619.6.283.4.983142589.7316.1300473420.841 1.2.840.113619.551726.420312.177022.222461.230571.501817.841

increment-date (true/false)

Offsets the date by the number of days defined in the date-increment setting, preserving the time and timezone. If the date format does not match the DICOM default: %Y%m%d , tell Flywheel what format to expect by using the date-format setting. Warning: You are responsible for setting a valid date format for the file type being processed.

Can be applied at the global level as well as at the field level.

Example:

- name: StudyDate
  increment-date: true
  date-format: "%Y-%m-%d"

increment-datetime (true/false)

Offsets the date by the number of days defined in the date-increment setting of the file profile, preserving the time and timezone. If the format of the datetime fields in your data does not match the DICOM default: %Y%m%d%H%M%S.%f, tell Flywheel what format to expect by using the datetime-format.

Can be applied at the global level as well as at the field level.

Example:

- name: AcquisitionDateTime
  increment-datetime: true
  datetime-format: "%Y-%m-%d %H:%M:%S"

jitter (true/false)

Offsets a numeric (integer or float) value by a random value drawn from a uniform distribution centered on 0. By default, Flywheel uses integers and the range [-1,1]. You can change this range with jitter-range. You can also set if the numeric value is an integer or a floating-point number with jitter-type.

Can be applied at the global level as well as at the field level.

Example:

- name: PatientWeight
  jitter: true
  jitter-range: 10
  jitter-type: float

Additional Considerations: Additional DICOM constraints apply for DICOM data elements based on VR:

  • If jitter-type:float and VR is ["IS", "UL", "US"], then Flywheel will convert the jittered value to an integer.
  • If jittered_value<0 and VR is ["UL", "US"], then Flywheel will set jittered_value = 0. (unsigned short, unsigned long)
  • if jitter_value> 65535, and VR is “US” then Flywheel will set new_value = 65535

keep (true/false)

Used when creating a keeplist using remove-undefined:true or to override global or file de-identification settings.

Example:

- name: SeriesDescription
  keep: true

Note: If only name is defined as key in the field configuration, Flywheel defaults to keep: true.

replace-with (string)

Replaces the contents of the field with the value provided. Please be aware of the the length of the field being replaced because some DICOM fields only support a limited number of characters. By default, the field will be created in the Flywheel record if it does not exist. This behavior can be reversed by setting replace-with-insert: False at the profile or the field level.

Example:

- name: PatientID
  replace-with: REDACTED
  replace-with-insert: False