De-identification Profiles
Introduction
To avoid exposing PHI, consider how you may need to de-identify your data before uploading it to Flywheel. In Flywheel, data is de-identified using a de-id profile.
Instruction Steps
What is the de-id profile?
A de-id profile is a set of instructions for what to do with file metadata that may include PHI.
The de-id profile options apply de-identification transformations in 2 different ways:
- File settings: At this level, the de-identification settings apply to all DICOM data by default. For example, you can use the
remove-private-tags
option to remove all non-standard DICOM tags. See the reference guide at the end of this article for more possible file settings. - Fields settings: Use the DICOM keywords or tags to de-identify specific fields. This setting takes precedent over the file settings, which means you can set the
remove-private-tags
totrue
at the file level, but then choose to keep a specific custom tag by using the field transformationkeep
. See the reference guide at the end of this article for more possible field options.
You can skip this portion of the config file if you are uploading to a location that already de-identifies data using a site, group, or project de-id profile or if you do not wish to de-identify your data.
Add a De-id Profile Section to the Config File
- Below the template section add a new section for the de-id profile. Below is an example config file with template and de-id profile sections:
#####
# Template Settings
#####
template:
- pattern: "{subject}"
- pattern: "{session}"
- pattern: "{acquisition}"
scan: dicom
####
# De-identification settings
####
name: Profile1
dicom:
# Date-increment controls how many days to offset each date field
# where the increment-date (shown below) is configured.
#Positive values will result in later dates, negative
# values will result in earlier dates.
date-increment: -17
# patient-age-from-birthdate sets the DICOM header as a 3-digit value with a suffix
# be 091D, and that same age in months would be 003M. By default, if
# the age fits in days, then days will be used,
# otherwise if it fits in months, then months
# will be used, otherwise years will be used
patient-age-from-birthdate: true
# all data elements not defined in fields section of the profile will be removed.
# If any field references a nested element in a sequence the whole sequence element
# will be kept.
remove-undefined: true
# Set patient age units as Years. Other options include months (M) and days (D)
patient-age-units: Y
# The following are field transformations.
# Remove, replace-with, increment-date, hash, and hashuid can be used with any DICOM
# field. Replace name with the DICOM field "keyword" by the DICOM standard
fields:
# Use remove Remove a dicom field Removes the field from the DICOM entirely.
# If removal is not supported then this field will be blank.
# This example removes PatientID.
- name: PatientID
replace-with: REDACTED
# Replace a dicom field with the value provided.
# This example replaces “StationName” with "XXXX" in Flywheel
- name: StationName
replace-with: XXXX
# Offsets the date by the number of days defined in
# the date-increment setting above, preserving the time
# and timezone. In this example, StudyDate appears as 17 days earlier
- name: StudyDate
increment-date: true
# You can refer to fields by their DICOM tag or keyword
# Applies one-way hash to a unique string
- name: (0008,0050)
hash: true
# Replaces a UID field with a hashed version of that
# field. The first four nodes (prefix) and last node
# (suffix) will be preserved, with the middle being
# replaced by the hashed value
- name: ConcatenationUID
hashuid: true
# The fields below are listed so that they are not removed as part of the
# remove-undefined setting above.
- name: SeriesInstanceUID
- name: Modality
- name: SeriesNumber
- name: ScheduledProcedureStepID
- name: RequestedProcedureID
- name: StudyTime
- name: StudyID
- name: SeriesNumber
- name: PatientID
- name: StudyInstanceUID
- name: ProtocolName
- name: AcquisitionDate
increment-date: true
- name: AcquisitionDateTime
- name: AcquisitionTime
- name: SeriesDate
increment-date: true
- name: SeriesTime
- Update the template to fit your dataset by adding or removing fields or updating the transformation options.
Warning
Flywheel requires the following DICOM data elements to sort and label data when uploaded via the CLI. Make sure that your de-id template does not remove these fields. However, you can transform them by incrementing or replacing the value.
The OHIF Viewer
The OHIF Viewer requires the following tags:
Keyword | Tag | VR |
---|---|---|
SeriesInstanceUID | (0020,000E) | UI |
StudyInstanceUID | (0020,000D) | UI |
Modality | (0008,0060) | CS |
SeriesNumber | (0020,0011) | IS |
ScheduledProcedureStepID | (0040,0009) | SH |
RequestedProcedureID | (0040,1001) | SH |
StudyDate | (0008,0020) | DA |
StudyTime | (0008,0030) | DM |
StudyID | (0020,0010) | SH |
SeriesNumber | (0020,0011) | IS |
PatientID | (0010,0020) | LO |
Flywheel Hierarchy Field Mappings to DICOM Tags
Keyword | Tag | Flywheel Field | VR |
---|---|---|---|
Study Comments* | (0032,4000)* | Group ID Project Label Subject ID | |
Study Instance UID | (0020,000D) | Session UID | UI |
Study Description | (0008,1030) | Session Label | LO |
Series Instance UID | (0020,000E) | Acquisition UID | UI |
Protocol Name | (0018,1030) | Acquisition Label | LO |
*It is possible to customize which DICOM field(s) Flywheel should configure to capture Routing String values (Group, Project, and Subject)
Flywheel Field Mappings to DICOM Tags
If these values are not present in the DICOM tags, then the Flywheel timestamp will be blank.
Keyword | Tag | Flywheel Field | VR |
---|---|---|---|
Acquisition Date | (0008,0022) | Timestamp | DA |
Acquisition DateTime | (0008,002A) | DT | |
Acquisition Time | (0008,0032) | TM | |
If one set of DICOM tag values is not present then we try to use the other | |||
Series Date | (0008,0021) | Timestamp | DA |
Series Time | (0008,0031) | TM |
De-identification Profile Settings Reference
This section explains what each de-identification setting does as well as shows an example of how data is de-identified. These settings are broken into file and field settings:
File Settings
These settings are applied to ALL DICOM data by default. They offer broad strokes of de-identification. You can override these for specific tags using field settings.
date-format (string)
Describes what format Flywheel should expect for dates in the metadata of the file. This enables Flywheel to properly parse the date. Only use if the date fields in your data have a format that is different than the DICOM default %Y%m%d
The format interpretation follows the format codes that the 1989 C standard requires.
Example:
datetime-format (string)
Describes what format Flywheel should expect for dates in the metadata of the file. This enables Flywheel to properly parse the date. Only use if the datetime fields in your data have a format that is different than the DICOM default %Y%m%d%H%M%S.%f
The format interpretation follows the format codes that the 1989 C standard requires.
Example:
date-increment (numeric)
Controls how much time to offset each date or datetime field where the increment-date(true/false) or increment-datetime(true/false) transformation is chosen.
- Positive values result in later dates
- Negative values result in earlier dates
- Incrementing by a multiple of 7 will keep the weekday consistent for shifted dates
- Incrementing by a non-integer value will also modify the time of datetime element (For example, 0.5 will increment by 12h datetime).
Example:
Tag | Original metadata | De-identified metadata |
---|---|---|
StudyDate(0008,0020) | 20150215 | 20150210 |
jitter-range (numeric)
The range used to offset a value by a random number. The new value is in [-jitter-range, +jitter-range]. Use jitter-type to change the random number to an integer
Jitter-range can also be set at field level.
Default is jitter-range is 2.
Example:
Tag | Original metadata | De-identified metadata |
---|---|---|
PatientWeight (0010,1030) | 54.43 | 60 |
Jitter-type (int/float)
Draws a random number from a uniform distribution set by jitter-range. You can configure the random number to be an integer (int) or a floating-point number (float). Jitter-type can also used at the field level.
Default is float
.
Example:
Tag | Original metadata | De-identified metadata |
---|---|---|
PatientWeight (0010,1030) | 54.43 | 44.43 |
patient-age-from-birthdate (true/false)
When set to true
, this will set the PatientAge
DICOM header as a 3-digit value with a suffix indicating units.
For example, an age in days would be 091D, and that same age in months would be 003M. By default, the age will be set using a best-fit approach. This means if the age fits in days, then days will be used, otherwise if it fits in months, then months will be used, otherwise years will be used.
Default is false.
Example:
Tag | Original metadata | De-identified metadata |
---|---|---|
PatientAge(0010,1010) | 97 | 097Y |
patient-age-units (string)
When set in conjunction with patient-age-from-birthdate
, this will act as a preference for which units to use. If the value does not fit into the desired unit, the next level of units will be used.
The most common use for this field would be to always use years as the patient age. Valid values are D
, M
, Y
for Days, Months and Years respectively.
Example:
Tag | Original metadata | De-identified metadata |
---|---|---|
PatientAge(0010,1010) | 97 | 097Y |
remove-private-tags (true/false)
When set to true
, the private DICOM tags will be removed. Private DICOM tags are any tags not included in the standard DICOM data elements
Default is false
.
Example:
Tag | Original metadata | De-identified metadata |
---|---|---|
MyPrivateTag(1235,0042) | Acme Inc | blank This field and value are not added to Flywheel metadata |
recurse-sequence (true/false)
When set to true
, each element of a sequence (VR=SQ) will be processed according to the profile, recursively for all nested sequence elements. When used with remove-undefined: true setting, any sequences or sequence elements defined under fields will result in the full sequence having the de-id profile applied.
Default is false
(false means only the top-level field for the sequence is processed by the de-id profile).
Note: When using this option, the profile fields section must not define fields acting on element of sequences or using regex.
Example:
When recurse-sequence
is set to true
, AccessionNumber
within that all DICOM Sequences are also de-identified according to the profile. You do not need to call out each tag individually.
remove-undefined (true/false)
This is also called a "keeplist". When set to true
, all data elements not defined in the fields section of the profile will be removed. If any field references a nested element in a sequence the whole sequence element will be kept. Default is false
.
Warning: When using this option, particular attention should be paid to the de-id profile to guarantee that the output DICOM still contains the mandatory data elements according to its Information Object Definitions (IOD)
Example:
dicom:
remove-private-tags: true
remove-undefined: true
fields:
- name: PatientID
replace-with: MY_PATIENT_ID
- name: StudyInstanceUID
hashuid: true
- name: PatientName
replace-with: REDACTED
Tag | Original metadata | De-identified metadata |
---|---|---|
PatientID | 0345 | MY_PATIENT_ID |
StudyInstanceUID | 1.2.840.113619.6.283.4.983142589.7316.1300473420.841 | 1.2.840.113619.551726.420312.177022.222461.230571.501817.841 |
PatientName | Smith John | REDACTED |
AcquisitionNumber | 1 | Blank This is not defined under fields, so it is removed |
StudyID | 4912 | Blank This is not defined under fields, so it is removed |
replace-with-insert (true/false)
If true
, replace-with actions will insert the field inside record if it does not exist already and replace its value. If false
, replace-with will not insert the field if it exists in the record already.
Default is true.
Example:
Tag | Original metadata | De-identified metadata |
---|---|---|
StudyTime(0008,0030) | blank (no data) | blank (no data) Without the replace-with-insert set to false, Flywheel would add REDACTED instead of leaving blank. |
Fields
The fields
portion of the de-id profile allows you to reference a specific DICOM data element and perform transformations on it. These follow the format:
dicom:
fields:
- name: <DICOM data element>
<field transformation>: <value>
<field transformation>: <value>
In this section you will find the different ways you can reference a specific DICOM field, tag, or keyword.
How to Reference a DICOM Data Element
This file profile supports 3 ways to reference a DICOM data element: keyword, tag, or dotty-notation.
The data elements in the DICOM File Meta information located in the optional 128 bytes of the DICOM File Preamble can be accessed in the same way as other tags.
Keyword
The keyword string as defined in the public DICOM dictionaries . For example: PatientName
, SOPClassUID
, and AcquisitionDate
Example:
Tag
When referencing the DICOM tag, you can use any of these notations:
Tuple
Format: (###,###)
Example:
# straight forward dicom tag with or w/o spaces
- name: (0002, 0010)
# dicom tag with no punctuation
#no spaces must be in quotes
- name: '00100
Hexadecimal
Format: * '00100010' ggggeeee * '0x00100010' 0xggggeeee
Example:
# dicom tag with no spaces
- name: '00100021'
# dicom tag as hexadecimal format
- name: '0x00100022'
Private tag notation
Format: - (GGGG, PrivateCreatorName, EE)
Example:
The private tag creator element is used to validate organization of private tags. In the above example, (0009,"NOT_GEMS_IDEN_01", 04) would be ignored because the creator value must match. See the official DICOM documentation for more information on private tags.
We rely on predefined private dictionaries to infer tag VR which is built from pydicom _private_dict.py and flywheel-metadata.
Repeater Group Notation
You can apply a field transformation to a range of DICOM elements or groups. Available for groups in range (50XX, EEEE) and (60XX, EEEE) only
Format: - Hexadecimal: 50XX####``0x50XX####``60XX####``0x60XX####
- Tuple: (50XX,####)``(60XX,####)
Example:
# repeating group hexadecimal example
- name: '0x60XX0022'
# repeating group no punctuation no spaces example
- name: '60XX0040'
# repeating group range example (old format)
- name: (6000-60FF, 0010)
# using XX to reference all elements in a range
- name: 0x50XX1001
replace-with: REDACTED
- name: (50XX,1001)
remove: true
- name: (60XX, 0050)
remove: true
See the DICOM documentation for more information on repeating groups.
Dotty-Notation
Notation for referencing an element within DICOM sequence. You can use a mix of keywords and tags.
For example: AnatomicRegionSequence.0.CodeValue
, 00082218.0.00080102
, AnatomicRegionSequence.0.00080104
In addition, the dotty-notation supports the use * to reference all indices of the sequence element at once.
AnatomicRegionSequence.*.CodeValue
.
The notation supports referring data element at any depth recursively.
Example:
# nested tags (*) meaning 'for all indexes in sequence'
- name: ReferencedImageSequence.*.ReferencedSOPClassUID
# (0008, 1140).*.(0008, 1150)
# nested tag with specific index in sequence
- name: ReferencedPerformedProcedureStepSequence.1.ReferencedSOPInstanceUID
# (0008, 1111).1.(0008, 1155)
Field Transformations
Once you reference a DICOM field, you can apply a field transformation to it. These field transformations override any file transformations set above.
hash(true/false)
Replace the contents of the field with a one-way cryptographic hash in hexadecimal form. Only the first 16 characters of the hash will be used, in order to support short strings.
Example:
hashuid(true/false)
Replaces a UID field with a hashed version of that field. By default, the first four nodes (prefix) and last node (suffix) will be preserved, with the middle being replaced by the hashed value.
Either of those can be applied at the field level or at the global level.
Example:
Tag | Original metadata | De-identified metadata |
---|---|---|
AccessionNumber | 1.2.840.113619.6.283.4.983142589.7316.1300473420.841 | 1.2.840.113619.551726.420312.177022.222461.230571.501817.841 |
Increment-date(true/false)
Offsets the date by the number of days defined in the date-increment setting, preserving the time and timezone. If the date format does not match the DICOM default: %Y%m%d
, tell Flywheel what format to expect by using the date-format setting (the date-format setting helps Flywheel parse the string.)
You can apply either of these settings at the global level as well as at the field level.
Example:
Warning: You are responsible for setting a date-format which is valid for the file type being processed.
Increment-datetime (true/false)
Offsets the date by the number of days defined in the date-increment setting of the file profile, preserving the time and timezone. If the format of the datetime fields in your data does not match the DICOM default: %Y%m%d%H%M%S.%f
, tell Flywheel what format to expect by using the datetime-format setting (the datetime-format
helps Flywheel parse the string.)
You can apply either of these settings at the global level as well as at the field level.
Example:
Warning: You are responsible for setting a date-format which is valid for the file type being processed.
jitter(true/false)
Offsets a numeric (integer or float) value by a random value drawn from a uniform distribution centered on 0. By default, Flywheel uses integers and the range [-1,1]. You can change this range with jitter-range. You can also specify if the numeric value is an integer or a floating-point number with jitter-type. These can also be set at the field level or the global level.
Example:
Additional Considerations:Additional DICOM constraints apply for DICOM data elements based on VR:
- if
jitter-type:float
and VR is["IS", "UL", "US"]
, then we will convert the jittered value to an integer - If VR is
["UL", "US"]
and thejittered_value<0
, then we convert the jittered_value = 0. (unsigned short, unsigned long) - if VR is
“US”
andjitter_value> 65535,
thennew_value = 65535
keep(true/false)
Used when creating a keeplist using remove-undefined:true or to override global or file de-identification settings.
Example:
Note: If only name
is defined as key in the field configuration, Flywheel defaults to keep: true
.
replace-with(string)
Replaces the contents of the field with the value provided. Please be aware of the the length of the field being replaced because some DICOM fields only support a limited number of characters. By default, the field will be created in the Flywheel record if it does not exist. This behavior can be reversed by setting replace-with-insert: False at the profile or the field.
Example: