Bulk Import - Mapping & Grouping (20.5+)
Applicability
Version 20.5 and Later
This document applies only to Flywheel version 20.5 and later.
For versions 19.4 to 20.4, refer to Bulk Import Mapping Rules (19.4 to 20.4).
Overview
By default, Flywheel will do its best to automatically detect if the files you are importing are DICOM versus some other arbitrary type of file.
- If Flywheel detects that a file is DICOM, it will treat the file accordingly and read the DICOM headers to determine where to place the data.
- Otherwise, if Flywheel does not detect that a file is DICOM, it will ignore the file contents and only use the original file path, including parent folder names, to determine where to place the data.
The Default Behavior section of this document explains in much greater detail how these default "mapping rules" are applied. For specific examples, skip ahead to the Examples section.
For a complete example combining all the information in this document, refer to Example 4: Mixed Types.
User-defined Behavior
If you need to change the way your data is organized into Flywheel, consider using new (BETA) CLI, which provides extensive options for User-defined Behavior.
Default Behavior
By default, Flywheel considers the following information from your source data to determine how to organize the data within Flywheel:
- File paths, including the following elements:
- File names
- File extensions
- Parent folder structure
- DICOM headers (DICOM files only)
At a glance, the logic generally works as follows:
flowchart LR
dt1(Detect Attachments)
dt2(Detect DICOM)
dt1-->dt2
map1(Derive destination hierarchy)
dt2-->map1
grp1("Package files (ZIP)")
map1-->grp1
up1(Upload to Flywheel)
grp1-->up1
The following sections describe this process in much greater detail.
Note
Because of steps 1, 3, and 4, the way the source data is organized before import greatly affects how the data is imported and organized into Flywheel (a.k.a. "mapping").
1. Detecting Container Attachments
It is possible to import files to be stored as attachments to Project, Subject, and Session containers (instead of only as Acquisition contents).
To import files as attachments, simply place the files into the source folder representing the Flywheel container to which the file should be attached.
For example, see Example 1: Mappings for Container Attachments.
2. Detecting DICOM Files
Flywheel will read the source file path information to determine which files might be DICOM.
For the purposes of determining where to place the data, Flywheel will assume that any file which meets both of the following rules is a DICOM file and should be processed accordingly:
- File name has either:
- An extension of
*.dcm
,*.dicom
, or*.ima
; OR - No extension AND either:
- Solely consists of numerals (e.g.,
98201293
), OR - Matches the format of a DICOM UID (e.g.,
1.5.300.0.9230010.3.1.4.3735312059.35976.1651523430.20.99
)
- An extension of
- File is located at a leaf-level folder in the source data.
- I.e., there are no folders next to this file in the source data.
Any file that fails either one of these criteria (file name format; file location) will not be considered a DICOM file and would be placed purely according to their parent folder structure. Non-DICOM file handling is explained further in the Non-DICOM Files section.
DICOM file type detection is currently limited in the following ways:
- File contents are not opened during type detection; only the file path information is used (names, extensions, etc.)
- DICOMDIR index files are not used
- "Magic Bytes" are not used
- ZIP archives archives are not processed as DICOM (at import time)
- Archives are not extracted during the import process
- The
*.dicom.zip
extension is not detected as DICOM (at import time)
Limitation: Pre-zipped DICOM
For the purposes of mapping to the Flywheel hierarchy, ZIP files will not be detected as DICOM even if the file name contains *.dicom.zip
. Pre-zipped DICOM files will be handled as non-DICOM files (at import time), and only their source file path information will be used.
This also means DICOM header to Flywheel metadata extraction will not occur at import time for such files.
However, once placed into Flywheel, pre-zipped files whose names end with *.dicom.zip
will be classified as DICOM type by Flywheel Core according to Flywheel Core's file type detection rules, thus allowing for DICOM-specific gear rules to be triggered accordingly.
For more information about de-identification support with Bulk Import, refer to the Bulk Import De-Identification documentation.
Limitation: Magic Bytes
Flywheel uses a different, stronger mechanism to detect DICOM files for the purposes of de-identification. In the case of de-identification, Flywheel reads the file content and looks for the File Signature (i.e., "Magic Bytes") indicating if the file is DICOM or not.
The File Signature mechanism is used for de-identification since it is a reliable indicator of file type, and it is critical ensure all DICOM files are de-identified properly. However, this mechanism is not used for file type detection in other processes (e.g., determining file placement) for performance and scalability reasons.
Limitation: DICOMDIR Structure
The DICOMDIR index file is not read or used for DICOM type detection.
Instead, Flywheel makes assumptions about the naming conventions typically used with DICOMDIR structures (e.g., extension-less files with numeric names) to infer if a file may be DICOM.
3. Deriving Destination Hierarchy
3.a. For DICOM Files
For the files detected as DICOM (see DICOM Detection), Flywheel will open the files and read their DICOM header information.
The following sections detail how the DICOM header information will be used to determine both:
For a complete example, see Example 2: Mappings for DICOM Files.
Tip
Additional documentation on DICOM header to Flywheel metadata mappings is available in the new (BETA) CLI documentation for the import run
command.
DICOM Header to Flywheel Hierarchy Mappings
From DICOM files, Flywheel will generate the container labels for the desired destination Flywheel hierarchy according to the following mappings:
Flywheel Container Label | DICOM Header Tag |
---|---|
subject.label | PatientID |
session.label | StudyDescription fallback to session.timestamp fallback to StudyInstanceUID |
acquisition.label | SeriesNumber - SeriesDescription fallback to SeriesNumber - ProtocolName fallback to acquisition.timestamp (formatted as %Y-%m-%dT%H:%M:%S ) fallback to SeriesInstanceUID and only prefixed if SeriesNumber is set |
file.name | Copied from acquisition.label |
DICOM Header to Flywheel Metadata Mappings
From DICOM files, Flywheel will generate additional metadata for the desired destination Flywheel hierarchy according to the following mappings:
Flywheel Container Type | Flywheel Metadata Field | DICOM Header Tag |
---|---|---|
Subject | subject.firstname | Split from PatientName |
Subject | subject.lastname | Split from PatientName |
Subject | subject.sex | PatientSex |
Session | session.uid | StudyInstanceUID |
Session | session.age | PatientAge (converted to seconds) fallback to difference between acquisition.timestamp and PatientBirthDate |
Session | session.weight | PatientWeight |
Session | session.operator | OperatorsName |
Session | session.timestamp | Combination of StudyDate and StudyTime fallback to combination of SeriesDate and SeriesTime fallback to AcquisitionDateTime fallback to combination of AcquisitionDate and AcquisitionTime with respect to TimezoneOffsetFromUTC |
Acquisition | acquisition.uid | SeriesInstanceUID |
Acquisition | acquisition.timestamp | AcquisitionDateTime fallback to combination of AcquisitionDate and AcquisitionTime fallback to combination of SeriesDate and SeriesTime fallback to combination of StudyDate and StudyTime with respect to TimezoneOffsetFromUTC |
3.b. For Arbitrary Files (non-DICOM)
For any file that Flywheel does not detect as DICOM, Flywheel uses the source file path information, including parent folder names, to determine how to organize the files within Flywheel. In this case, the files are not opened and their content is ignored.
Since the original file path is used to determine where to place non-DICOM files in Flywheel, such non-DICOM files must be organized carefully before importing into Flywheel:
- The "root" folder represents a single project
- Each first-level folder (directly inside the Project root) represents a single Subject (i.e., Patient)
- Each second-level folder (directly inside the Subject) represents a single Session (i.e., Study)
- Each "leaf" (lowest-level) folder (directly inside a Session) represents a single Acquisition
- Each acquisition folder contains in a single "leaf" folder
- Each "leaf" folder contains a single acquisition
For a complete example, see Example 3: Mappings for Arbitrary Files (non-DICOM).
Info
A "leaf-level" folder is any folder that contains only files and not any additional lower-level folders.
4. Grouping & Packaging Files into ZIP Archives (DICOM only)
It is strongly recommended that DICOM files are stored in Flywheel Core as ZIP archives (e.g., *.dicom.zip
) rather than as loose files (e.g., *.dcm
, etc.).
When the source data is loose DICOM files (e.g., *.dcm
, etc.), Bulk Import will package the DICOM files into the recommended ZIP archives. To do this, Bulk Import needs to decide which files to place into which ZIP archive. This process is called "grouping."
By default, Bulk Import groups DICOM files using the following logic:
- Non-DICOM files are not packaged into ZIP archives; they are uploaded individually.
- All DICOM files are grouped into separate ZIP archives by
StudyInstanceUID
andSeriesInstanceUID
.- Groups containing only a single DICOM file are still zipped even though each resulting ZIP archive contains only a single file.
- DICOM localizer files are separated into their own ZIP archives.
- The following DICOM tags are inspected when determining whether a DICOM file is a localizer or not:
InstanceNumber
ImageOrientationPatient
ImagePositionPatient
Rows
Columns
- The following DICOM tags are inspected when determining whether a DICOM file is a localizer or not:
- Each ZIP archive will be named as
<acquisition.label>.dicom.zip
- Where
acquisition.label
is calculated as described in the DICOM Header to Flywheel Hierarchy Mappings section.
- Where
- Each individual DICOM file packaged into a ZIP archive will be renamed as
{SOPInstanceUID}.{Modality}.dcm
- This only applies to DICOM files that are zipped. When zipping is disabled, the loose DICOM files will not be renamed.
- Each ZIP archive will contain a top level folder to avoid polluting the current directory when extracted.
- E.g., the contents of
12345.zip
are nested inside a folder named12345
(like12345.zip/12345/abcdef.MR.dcm
), so that when12345.zip
is extracted its contents are neatly organized (like./12345/abcdef.MR.dcm
) and not mixed into the current directory.
- E.g., the contents of
For complete examples, refer to Example 2 and Example 4.
Limitation: Split series
Each DICOM series must be fully contained in a single "leaf-level" input folder. A "leaf-level" folder is any folder that contains only files and not any additional folders.
Starting with version 20.5, a single source folder can contain multiple DICOM series. However, each DICOM series must still be fully contained within a single source folder.
5. Uploading Files to Flywheel
Files can be uploaded to the following locations within Flywheel
- Project, Subject, or Session attachments
- Review the section on Container Attachments.
- Acquisition container contents
If any conflicts or duplication scenarios are encountered while uploading the files to Flywheel, the affected files will be quarantine and flagged for manual review. Refer to the documentation non Bulk Import Conflict Handling for more details.
User-defined Behavior
User-defined Mapping Rules
Currently, user-defined mapping rules can only be used when starting an Import via the new (BETA) CLI.
More information about user-defined mapping options can be found in the new (BETA) CLI documentation for the import run
command.
User-defined Grouping Logic
The --dicom-group-by <tag>
option in the new (BETA) CLI can be used to define how Bulk Import should group DICOM files. This option can be set multiple times to "group by" multiple DICOM tags at once.
For example, the following commands can be used to package DICOM files by the combination of StudyInstanceUID
, StudyDate
, StudyTime
, and SeriesNumber
.
Refer to the new (BETA) CLI documentation on import run for more information about the available configuration options.
Disable zipping of single DICOM files
The --no-zip-single
option in the new (BETA) CLI can be set to disable zipping of single DICOM files.
Refer to the new (BETA) CLI documentation on import run for more information about the available configuration options.
Configuring ZIP File Names
The naming convention for the ZIP archives created by Bulk Import can be configured with the new (BETA) CLI by setting the file.name
field in a user-defined --mapping
rule.
As a reminder, the name of any arbitrary file being imported into Flywheel can be configured using the file.name
field with the --mapping
option. This option also supports controlling the name of ZIP archives that are created when packaging DICOM files (e.g., when -t dicom
is set).
Refer to the new (BETA) CLI documentation on import run for more information about the available configuration options.
Configuring Individual DICOM File Names
The naming of single DICOM files can be configured with the new (BETA) CLI using the --dicom-instance-name <template>
option.
The --dicom-instance-name
option only affects DICOM files that are packaged into ZIP archives; it does not affect single DICOM files when zipping is disabled (e.g., when the --no-zip-single
option is set).
Refer to the new (BETA) CLI documentation on import run for more information about the available configuration options.
Examples
Example 1: Container Attachments
Consider the following source data structure:
With the default behavior, this source data would be imported into Flywheel as:
Note how the following files are imported based on where they were located in the source folder structure:
objectives-1.csv
is imported as an attachment to the Destination Projectconsent-form-1.pdf
is imported as an attachment to the Subject labeledPatient123
in the Destination Projecttech-notes-1.txt
is imported as an attachment to the Session labeledStudy20220423
of the Subject labeledPatent123
in the Destination Project
Example 2: DICOM Files
Consider the following source data structure:
Where the DICOM files contain the following headers:
DICOM Header | file1.dcm | file2.dcm | file3.dcm | 9572012 | 0012893 |
---|---|---|---|---|---|
PatientID | Subj123 | Subj123 | Subj123 | Subj456 | Subj456 |
StudyInstanceUID | 1234 | 1234 | 1234 | 5678 | 5678 |
StudyDescription | Timepoint1 | Timepoint1 | Timepoint2 | Timepoint1 | Timepoint1 |
SeriesInstanceUID | 9876 | 9876 | 7654 | 3210 | 3210 |
SeriesNumber | 1 | 1 | 1 | 4 | 4 |
SeriesDescription | Chest X-ray | Chest X-ray | Head CT | PET scan | PET scan |
SOPInstanceUID | abc123 | abc456 | def789 | ghi012 | jkl345 |
With the default behavior, this source data would be imported into Flywheel as:
Where the ZIP file contents are:
Note a few nuances:
- Header-based mappings: The destination Flywheel hierarchy structure is based entirely on the DICOM headers and not on any of the source file path information.
- Loose organization: There is no rule on how many parent folders a DICOM series can be contained within. Compare the contents of the
Patient1/
andPatient2/
folders from the source data, for example. - No split DICOM series: The only restriction on source data organization for DICOM files is that each DICOM series must be fully contained in exactly 1 leaf-level source folder.
- Single-file Handling: DICOM series containing only 1 file are zipped.
Example 3: Arbitrary Files (non-DICOM)
Consider the following source data structure:
With the default behavior, this source data would be imported into Flywheel as:
Note a few nuances:
- Container Labels: The source folder names are used as the container labels (e.g., "Patient123" is the Subject label).
- No Zipping: The files are not grouped together and are stored in Flywheel individually as-is.
- Arbitrary Files Types: Any type of file can be imported (not only DICOM). The uploaded file will have its type automatically set in Flywheel according to the matching rules for File Types in Flywheel Core.
Example 4: Mixed Types
Where the DICOM files contain the following headers:
DICOM Header | file1.dcm | file2.dcm | file3.dcm | 9572012 | 0012893 |
---|---|---|---|---|---|
PatientID | Subj123 | Subj123 | Subj123 | Subj456 | Subj456 |
StudyInstanceUID | 1234 | 1234 | 1234 | 5678 | 5678 |
StudyDescription | Timepoint1 | Timepoint1 | Timepoint2 | Timepoint1 | Timepoint1 |
SeriesInstanceUID | 9876 | 9876 | 7654 | 3210 | 3210 |
SeriesNumber | 1 | 1 | 1 | 4 | 4 |
SeriesDescription | Chest X-ray | Chest X-ray | Head CT | PET scan | PET scan |
SOPInstanceUID | abc123 | abc456 | def789 | ghi012 | jkl345 |
With the default behavior, this source data would be imported into Flywheel as:
Where the ZIP file contents are:
Note a few nuances:
- Skipped Files:
scan-notes-1.txt
is skipped and not uploaded to Flywheel, because there is "no matching rule."- Specifically, this file is not detected as DICOM and the parent directory structure does not contain enough levels to represent the desired Flywheel hierarchy, and so it is skipped due to not having enough information to map to Flywheel.
- Additional Containers: There are 3 Subject containers in the destination Flywheel hierarchy even though the source folder structure appeared to have only 2 patient folders.
- This is because the
Patient1/
source folder contained some folders with DICOM files and some folders containing non-DICOM files. - The DICOM files mapped to
Subj123/
based on thePatientID
header. - The non-DICOM files mapped to
Patient1/
based on the file path information. - To combine these containers into just
Subj123/
, the source folder must be renamed toSubj123/
to match thePatientID
header of the contained DICOM files. - Note that this same behavior cascades down to the Session and Acquisition containers as well.
- This is because the