Bulk Import - Mapping to the Flywheel Hierarchy (19.4+)
Applicability
Version 19.4 and Later
This document applies only to Flywheel version 19.4 and later.
- For Flywheel versions 19.2 to 19.3, refer to Bulk Import Mapping Rules (19.2 to 19.3) instead.
- For Flywheel version 19.1 or earlier, refer to Bulk Import Mapping Rules (-19.1) instead.
Overview
By default, Flywheel will do its best to automatically detect if the files you are importing are DICOM versus some other arbitrary type of file.
If Flywheel detects that a file is DICOM, it will treat the file accordingly and read the DICOM headers to determine where to place the data.
If Flywheel does not detect that a file is DICOM, it will ignore the file content and solely use the original file path, including parent folder names, to determine where to place the data.
The Default Behavior section of this document explains in much greater detail how these default "mapping rules" are applied. For specific examples, skip ahead to the Examples section.
For a complete example combining all the information in this document, refer to Example 4: Mixed Types.
Mixed File Types
If your dataset contains a mix of both DICOM and non-DICOM files, don't worry -- you can still import it all at once!
Flywheel's automatic file type detection will identify which files are DICOM and handle each file accordingly. Additionally, with user-defined mapping rules, you can combine both DICOM header and file path information.
User-defined Behavior
If you need to change the way your data is organized into Flywheel, consider using new (BETA) CLI, which provides extensive options for User-defined Behavior.
Default Behavior
By default, Flywheel considers the following information from your source data to determine how to organize the data within Flywheel:
- File paths, including the following elements:
- File names
- File extensions
- Parent folder structure
- DICOM headers (DICOM files only)
At a glance, the logic generally works as follows:
flowchart LR
dt1(Detect Attachments)
dt2(Detect DICOM)
dt1-->dt2
map1(Derive destination hierarchy)
dt2-->map1
grp1("Package files (ZIP)")
map1-->grp1
up1(Upload to Flywheel)
grp1-->up1
The following sections describe this process in much greater detail.
Note
Because of steps 1, 3, and 4, the way the source data is organized before import greatly affects how the data is imported and organized into Flywheel (a.k.a. "mapping").
1. Detecting Container Attachments
It is possible to import files to be stored as attachments to Project, Subject, and Session containers (instead of only as Acquisition contents).
To import files as attachments, simply place the files into the source folder representing the Flywheel container to which the file should be attached.
For example, see Example 1: Mappings for Container Attachments.
2. Detecting DICOM Files
Flywheel will read the source file path information to determine which files might be DICOM.
For the purposes of determining where to place the data, Flywheel will assume that any file which meets both of the following rules is a DICOM file and should be processed accordingly:
- File name has either:
- An extension of
*.dcm
,*.dicom
, or*.ima
; OR - No extension and solely consists of numerals (e.g.,
98201293
)
- An extension of
- File is located at a leaf-level folder in the source data.
- I.e., there are no folders next to this file in the source data.
Any file that fails any one of these criteria will not be considered a DICOM file and would be placed purely according to their parent folder structure. Non-DICOM file handling is explained further in the Non-DICOM Files section.
DICOM file type detection is currently limited in the following ways:
- File contents are not opened during type detection; only the file path information is used (names, extensions, etc.)
- DICOMDIR index files are not used
- "Magic Bytes" are not used
- ZIP archives archives are not processed as DICOM (at import time)
- Archives are not extracted during the import process
- The
*.dicom.zip
extension is not detected; only the*.zip
extension
Limitation: Pre-zipped DICOM
For the purposes of mapping to the Flywheel hierarchy, ZIP files will not be detected as DICOM even if the file name contains *.dicom.zip
. Pre-zipped DICOM files will be handled as non-DICOM files (at import time), and only their source file path information will be used.
This also means DICOM header to Flywheel metadata extraction will not occur at import time for such files.
However, once placed into Flywheel, pre-zipped files whose names end with *.dicom.zip
will be classified as DICOM type by Flywheel Core according to Flywheel Core's file type detection rules, thus allowing for DICOM-specific gear rules to be triggered accordingly.
For more information about de-identification support with Bulk Import, refer to the Bulk Import De-Identification documentation.
Limitation: Magic Bytes
Flywheel uses a different, stronger mechanism to detect DICOM files for the purposes of de-identification. In the case of de-identification, Flywheel reads the file content and looks for the File Signature (i.e., "Magic Bytes") indicating if the file is DICOM or not.
The File Signature mechanism is used for de-identification since it is a reliable indicator of file type, and it is critical ensure all DICOM files are de-identified properly. However, this mechanism is not used for file type detection in other processes (e.g., determining file placement) for performance and scalability reasons.
Limitation: DICOMDIR Structure
The DICOMDIR index file is not read or used for DICOM type detection.
Instead, Flywheel makes assumptions about the naming conventions typically used with DICOMDIR structures (e.g., extension-less files with numeric names) to infer if a file may be DICOM.
3. Deriving Destination Hierarchy
3.a. For DICOM Files
For the files detected as DICOM (see DICOM Detection), Flywheel will open the files and read their DICOM header information.
The following sections detail how the DICOM header information will be used to determine both:
For a complete example, see Example 2: Mappings for DICOM Files.
Tip
Additional documentation on DICOM header to Flywheel metadata mappings is available in the new (BETA) CLI documentation for the import run
command.
DICOM Header to Flywheel Hierarchy Mappings
From DICOM files, Flywheel will generate the container labels for the desired destination Flywheel hierarchy according to the following mappings:
Flywheel Container Label | DICOM Header Tag |
---|---|
subject.label | PatientID |
session.label | StudyDescription fallback to session.timestamp fallback to StudyInstanceUID |
acquisition.label | SeriesNumber - SeriesDescription fallback to SeriesNumber - ProtocolName fallback to acquisition.timestamp (formatted as %Y-%m-%dT%H:%M:%S ) fallback to SeriesInstanceUID and only prefixed if SeriesNumber is set |
file.name | Copied from acquisition.label |
DICOM Header to Flywheel Metadata Mappings
From DICOM files, Flywheel will generate additional metadata for the desired destination Flywheel hierarchy according to the following mappings:
Flywheel Container Type | Flywheel Metadata Field | DICOM Header Tag |
---|---|---|
Subject | subject.firstname | Split from PatientName |
Subject | subject.lastname | Split from PatientName |
Subject | subject.sex | PatientSex |
Session | session.uid | StudyInstanceUID |
Session | session.age | PatientAge (converted to seconds) fallback to difference between acquisition.timestamp and PatientBirthDate |
Session | session.weight | PatientWeight |
Session | session.operator | OperatorsName |
Session | session.timestamp | Combination of StudyDate and StudyTime fallback to combination of SeriesDate and SeriesTime fallback to AcquisitionDateTime fallback to combination of AcquisitionDate and AcquisitionTime with respect to TimezoneOffsetFromUTC |
Acquisition | acquisition.uid | SeriesInstanceUID |
Acquisition | acquisition.timestamp | AcquisitionDateTime fallback to combination of AcquisitionDate and AcquisitionTime fallback to combination of SeriesDate and SeriesTime fallback to combination of StudyDate and StudyTime with respect to TimezoneOffsetFromUTC |
3.b. For Arbitrary Files (non-DICOM)
For any file that Flywheel does not detect as DICOM, Flywheel uses the source file path information, including parent folder names, to determine how to organize the files within Flywheel. In this case, the files are not opened and their content is ignored.
Since the original file path is used to determine where to place non-DICOM files in Flywheel, such non-DICOM files must be organized carefully before importing into Flywheel:
- The "root" folder represents a single project
- Each first-level folder (directly inside the Project root) represents a single Subject (i.e., Patient)
- Each second-level folder (directly inside the Subject) represents a single Session (i.e., Study)
- Each "leaf" (lowest-level) folder (directly inside a Session) represents a single Acquisition
- Each acquisition folder contains in a single "leaf" folder
- Each "leaf" folder contains a single acquisition
For a complete example, see Example 3: Mappings for Arbitrary Files (non-DICOM).
Info
A "leaf-level" folder is any folder that contains only files and not any additional lower-level folders.
4. Grouping & Packaging Files into ZIP Archives (DICOM only)
If an Acquisition contains multiple DICOM files, the files will be grouped together and packaged into a ZIP archive before uploading to Flywheel.
Note the following exceptions:
- Acquisitions containing single DICOM files are not packaged into ZIP archives before uploading to Flywheel.
- Non-DICOM files are never packaged into ZIP archives are are always uploaded individually.
Limitation: 1 DICOM Series per Input Folder
There are a few requirements on how DICOM files must be organized before upload:
- Each "leaf-level" input folder must contain only a single DICOM Series.
- Each DICOM Series must be fully contained in a single "leaf-level" input folder.
A "leaf-level" folder is any folder that contains only files and not any additional folders.
Multiple DICOM Series may be uploaded at the same time, so long as they are each located in their own "leaf-level" folder.
5. Uploading Files to Flywheel
Files can be uploaded to the following locations within Flywheel
- Project, Subject, or Session attachments
- Review the section on Container Attachments.
- Acquisition container contents
If any conflicts or duplication scenarios are encountered while uploading the files to Flywheel, the affected files will be quarantine and flagged for manual review. Refer to the documentation non Bulk Import Conflict Handling for more details.
User-defined Behavior
Currently, user-defined mapping rules can only be used when starting an Import via the new (BETA) CLI.
More information about user-defined mapping options can be found in the new (BETA) CLI documentation for the import run
command.
Examples
Example 1: Container Attachments
Consider the following source data structure:
With the default behavior, this source data would be imported into Flywheel as:
Note how the following files are imported based on where they were located in the source folder structure:
objectives-1.csv
is imported as an attachment to the Destination Projectconsent-form-1.pdf
is imported as an attachment to the Subject labeledPatient123
in the Destination Projecttech-notes-1.txt
is imported as an attachment to the Session labeledStudy20220423
of the Subject labeledPatent123
in the Destination Project
Example 2: DICOM Files
Consider the following source data structure:
Where the four DICOM files, contain the following headers:
DICOM Header | file1.dcm | file2.dcm | file3.dcm | 9572012 | 0012893 |
---|---|---|---|---|---|
PatientID | Subj123 | Subj123 | Subj123 | Subj456 | Subj456 |
StudyDescription | Timepoint1 | Timepoint1 | Timepoint2 | Timepoint1 | Timepoint1 |
SeriesNumber | 1 | 1 | 1 | 4 | 4 |
SeriesDescription | Chest X-ray | Chest X-ray | Chest X-ray | PET scan | PET scan |
With the default behavior, this source data would be importing into Flywheel as:
Where the ZIP file contents are:
Note a few nuances:
- Header-based mappings: The destination Flywheel hierarchy structure is based entirely on the DICOM headers and not on any of the source file path information.
- Loose organization: There is no rule on how many parent folders a DICOM series can be contained within. Compare the contents of the
Patient1/
andPatient2/
folders from the source data, for example. - Single DICOM Series per Folder: The only restriction on source data organization for DICOM files is that:
- Each DICOM Series must be fully contained in exactly 1 leaf-level source folder, and
- Each leaf-level source folder must contain at most 1 DICOM Series.
- Single-file Handling: DICOM series containing only 1 file are not zipped.
Example 3: Arbitrary Files (non-DICOM)
Consider the following source data structure:
With the default behavior, this source data would be importing into Flywheel as:
Note a few nuances:
- Container Labels: The source folder names are used as the container labels (e.g., "Patient123" is the Subject label).
- No Zipping: The files are not grouped together and are stored in Flywheel individually as-is.
- Arbitrary Files Types: Any type of file can be imported (not only DICOM). The uploaded file will have its type automatically set in Flywheel according to the matching rules for File Types in Flywheel Core.
Example 4: Mixed Types
Where the four DICOM files, contain the following headers:
DICOM Header | file1.dcm | file2.dcm | file3.dcm | 9572012 | 0012893 |
---|---|---|---|---|---|
PatientID | Subj123 | Subj123 | Subj123 | Subj456 | Subj456 |
StudyDescription | Timepoint1 | Timepoint1 | Timepoint2 | Timepoint1 | Timepoint1 |
SeriesNumber | 1 | 1 | 1 | 4 | 4 |
SeriesDescription | Chest X-ray | Chest X-ray | Chest X-ray | PET scan | PET scan |
With the default behavior, this source data would be importing into Flywheel as:
Note a few nuances:
- Skipped Files:
scan-notes-1.txt
is skipped and not uploaded to Flywheel, because there is "no matching rule."- Specifically, this file is not detected as DICOM and the parent directory structure does not contain enough levels to represent the desired Flywheel hierarchy, and so it is skipped due to not having enough information to map to Flywheel.
- Additional Containers: There are 3 Subject containers in the destination Flywheel hierarchy even though the source folder structure appeared to have only 2 patient folders.
- This is because the
Patient1/
source folder contained some folders with DICOM files and some folders containing non-DICOM files. - The DICOM files mapped to
Subj123/
based on thePatientID
header. - The non-DICOM files mapped to
Patient1/
based on the file path information. - To combine these containers into just
Subj123/
, the source folder must be renamed toSubj123/
to match thePatientID
header of the contained DICOM files. - Note that this same behavior cascades down to the Session and Acquisition containers as well.
- This is because the