Bulk Import - Default Behavior
Overview
By default, Flywheel automatically detects the type of files you are importing and processes them accordingly:
- DICOM files - Flywheel reads DICOM headers to determine where to place the data
- Non-DICOM files - Flywheel uses the file path structure (folder names) to determine where to place the data
This document explains in detail how Flywheel's automatic detection and processing works.
Customizing Import Behavior
If you need to change the way your data is organized into Flywheel, see Bulk Import - Customizing Import Rules to learn how to override default behavior.
How Default Processing Works
Flywheel considers the following information from your source data to determine how to organize files within Flywheel:
- File paths, including:
- File names
- File extensions
- Parent folder structure
- DICOM headers (DICOM files only)
At a glance, the logic works as follows:
flowchart LR
dt1(Detect Attachments)
dt2(Detect DICOM)
dt1-->dt2
map1(Derive destination hierarchy)
dt2-->map1
grp1("Package files (ZIP)")
map1-->grp1
up1(Upload to Flywheel)
grp1-->up1 The following sections describe this process in detail.
Note
Because of this processing logic, the way the source data is organized before import greatly affects how the data is imported and organized into Flywheel.
Step 1: Detecting Container Attachments
Flywheel can import files as attachments to Project, Subject, and Session containers (instead of only as Acquisition contents).
To import files as attachments, place the files into the source folder representing the Flywheel container to which the file should be attached.
For example, see Example 1: Container Attachments.
Step 2: Detecting DICOM Files
Flywheel reads the source file path information to determine which files might be DICOM.
For the purposes of determining where to place the data, Flywheel assumes that any file meeting both of the following rules is a DICOM file:
- File name has either:
- An extension of
*.dcm,*.dicom, or*.ima; OR - No extension AND either:
- Solely consists of numerals (e.g.,
98201293), OR - Matches the format of a DICOM UID (e.g.,
1.5.300.0.9230010.3.1.4.3735312059.35976.1651523430.20.99)
- An extension of
- File is located at a leaf-level folder in the source data
- I.e., there are no folders next to this file in the source data
Any file that fails either one of these criteria (file name format or file location) will not be considered a DICOM file and would be placed purely according to their parent folder structure.
Non-DICOM file handling is explained in Step 3b: Non-DICOM Files.
DICOM Detection Limitations
DICOM file type detection is currently limited in the following ways:
- File contents are not opened during type detection; only the file path information is used (names, extensions, etc.)
- DICOMDIR index files are not used
- "Magic Bytes" are not used
- ZIP archives are not processed as DICOM (at import time)
- Archives are not extracted during the import process
- The
*.dicom.zipextension is not detected as DICOM (at import time)
Limitation: Pre-zipped DICOM
For the purposes of mapping to the Flywheel hierarchy, ZIP files will not be detected as DICOM even if the file name contains *.dicom.zip.
Pre-zipped DICOM files will be handled as non-DICOM files (at import time), and only their source file path information will be used.
This also means DICOM header to Flywheel metadata extraction will not occur at import time for such files.
However, once placed into Flywheel, pre-zipped files whose names end with *.dicom.zip will be classified as DICOM type by Flywheel Core according to Flywheel Core's file type detection rules, thus allowing for DICOM-specific gear rules to be triggered accordingly.
Limitation: Magic Bytes
Flywheel uses a different, stronger mechanism to detect DICOM files for the purposes of de-identification.
In the case of de-identification, Flywheel reads the file content and looks for the File Signature (i.e., "Magic Bytes") indicating if the file is DICOM or not.
The File Signature mechanism is used for de-identification since it is a reliable indicator of file type, and it is critical to ensure all DICOM files are de-identified properly.
However, this mechanism is not used for file type detection in other processes (e.g., determining file placement) for performance and scalability reasons.
For more information about de-identification support with Bulk Import, refer to the Bulk Import De-Identification documentation.
Limitation: DICOMDIR Structure
The DICOMDIR index file is not read or used for DICOM type detection.
Instead, Flywheel makes assumptions about the naming conventions typically used with DICOMDIR structures (e.g., extension-less files with numeric names) to infer if a file may be DICOM.
Step 3: Deriving Destination Hierarchy
Step 3a: DICOM Files
For files detected as DICOM (see DICOM Detection), Flywheel opens the files and reads their DICOM header information.
The following sections detail how the DICOM header information is used to determine both:
For a complete example, see Example 2: DICOM Files.
Tip
Additional documentation on DICOM header to Flywheel metadata mappings is available in the new (BETA) CLI documentation for the import run command.
DICOM Header to Flywheel Hierarchy Mappings
From DICOM files, Flywheel generates the container labels for the desired destination Flywheel hierarchy according to the following mappings:
| Flywheel Container Label | DICOM Header Tag |
|---|---|
subject.label | PatientID |
session.label | StudyDescription fallback to session.timestamp fallback to StudyInstanceUID |
acquisition.label | SeriesNumber - SeriesDescription fallback to SeriesNumber - ProtocolName fallback to acquisition.timestamp (formatted as %Y-%m-%dT%H:%M:%S) fallback to SeriesInstanceUID and only prefixed if SeriesNumber is set |
file.name | Copied from acquisition.label |
DICOM Header to Flywheel Metadata Mappings
From DICOM files, Flywheel generates additional metadata for the desired destination Flywheel hierarchy according to the following mappings:
| Flywheel Container Type | Flywheel Metadata Field | DICOM Header Tag |
|---|---|---|
| Subject | subject.firstname | Split from PatientName |
| Subject | subject.lastname | Split from PatientName |
| Subject | subject.sex | PatientSex |
| Session | session.uid | StudyInstanceUID |
| Session | session.age | PatientAge (converted to seconds) fallback to difference between acquisition.timestamp and PatientBirthDate |
| Session | session.weight | PatientWeight |
| Session | session.operator | OperatorsName |
| Session | session.timestamp | Combination of StudyDate and StudyTime fallback to combination of SeriesDate and SeriesTime fallback to AcquisitionDateTime fallback to combination of AcquisitionDate and AcquisitionTime with respect to TimezoneOffsetFromUTC |
| Acquisition | acquisition.uid | SeriesInstanceUID |
| Acquisition | acquisition.timestamp | AcquisitionDateTime fallback to combination of AcquisitionDate and AcquisitionTime fallback to combination of SeriesDate and SeriesTime fallback to combination of StudyDate and StudyTime with respect to TimezoneOffsetFromUTC |
Step 3b: Non-DICOM Files
For any file that Flywheel does not detect as DICOM, Flywheel uses the source file path information, including parent folder names, to determine how to organize the files within Flywheel.
In this case, the files are not opened and their content is ignored.
Since the original file path is used to determine where to place non-DICOM files in Flywheel, such non-DICOM files must be organized carefully before importing into Flywheel:
- The "root" folder represents a single project
- Each first-level folder (directly inside the Project root) represents a single Subject (i.e., Patient)
- Each second-level folder (directly inside the Subject) represents a single Session (i.e., Study)
- Each "leaf" (lowest-level) folder (directly inside a Session) represents a single Acquisition
- Each acquisition folder contains in a single "leaf" folder
- Each "leaf" folder contains a single acquisition
For a complete example, see Example 3: Arbitrary Files (non-DICOM).
Info
A "leaf-level" folder is any folder that contains only files and not any additional lower-level folders.
Step 4: Grouping & Packaging Files into ZIP Archives
It is strongly recommended that DICOM files are stored in Flywheel Core as ZIP archives (e.g., *.dicom.zip) rather than as loose files (e.g., *.dcm, etc.).
When the source data is loose DICOM files (e.g., *.dcm, etc.), Bulk Import packages the DICOM files into the recommended ZIP archives.
To do this, Bulk Import needs to decide which files to place into which ZIP archive. This process is called "grouping."
By default, Bulk Import groups DICOM files using the following logic:
- Non-DICOM files are not packaged into ZIP archives; they are uploaded individually
- All DICOM files are grouped into separate ZIP archives by
StudyInstanceUIDandSeriesInstanceUID- Groups containing only a single DICOM file are still zipped even though each resulting ZIP archive contains only a single file
- DICOM localizer files are separated into their own ZIP archives
- The following DICOM tags are inspected when determining whether a DICOM file is a localizer or not:
InstanceNumberImageOrientationPatientImagePositionPatientRowsColumns
- The following DICOM tags are inspected when determining whether a DICOM file is a localizer or not:
- Each ZIP archive will be named as
<acquisition.label>.dicom.zip- Where
acquisition.labelis calculated as described in the DICOM Header to Flywheel Hierarchy Mappings section
- Where
- Each individual DICOM file packaged into a ZIP archive will be renamed as
{SOPInstanceUID}.{Modality}.dcm- This only applies to DICOM files that are zipped. When zipping is disabled, the loose DICOM files will not be renamed
- Each ZIP archive will contain a top level folder to avoid polluting the current directory when extracted
- E.g., the contents of
12345.zipare nested inside a folder named12345(like12345.zip/12345/abcdef.MR.dcm), so that when12345.zipis extracted its contents are neatly organized (like./12345/abcdef.MR.dcm) and not mixed into the current directory
- E.g., the contents of
For complete examples, refer to Example 2 and Example 4.
Limitation: Split Series
Each DICOM series must be fully contained in a single "leaf-level" input folder.
A "leaf-level" folder is any folder that contains only files and not any additional folders.
Starting with version 20.5, a single source folder can contain multiple DICOM series. However, each DICOM series must still be fully contained within a single source folder.
Step 5: Uploading Files to Flywheel
Files can be uploaded to the following locations within Flywheel:
- Project, Subject, or Session attachments
- Review the section on Container Attachments
- Acquisition container contents
If any conflicts or duplication scenarios are encountered while uploading the files to Flywheel, the affected files will be quarantined and flagged for manual review.
Refer to the documentation on Bulk Import Conflict Handling for more details.
Examples
Example 1: Container Attachments
Consider the following source data structure:
With the default behavior, this source data would be imported into Flywheel as:
Note how the following files are imported based on where they were located in the source folder structure:
objectives-1.csvis imported as an attachment to the Destination Projectconsent-form-1.pdfis imported as an attachment to the Subject labeledPatient123in the Destination Projecttech-notes-1.txtis imported as an attachment to the Session labeledStudy20220423of the Subject labeledPatient123in the Destination Project
Example 2: DICOM Files
Consider the following source data structure:
Where the DICOM files contain the following headers:
| DICOM Header | file1.dcm | file2.dcm | file3.dcm | 9572012 | 0012893 |
|---|---|---|---|---|---|
PatientID | Subj123 | Subj123 | Subj123 | Subj456 | Subj456 |
StudyInstanceUID | 1234 | 1234 | 1234 | 5678 | 5678 |
StudyDescription | Timepoint1 | Timepoint1 | Timepoint2 | Timepoint1 | Timepoint1 |
SeriesInstanceUID | 9876 | 9876 | 7654 | 3210 | 3210 |
SeriesNumber | 1 | 1 | 1 | 4 | 4 |
SeriesDescription | Chest X-ray | Chest X-ray | Head CT | PET scan | PET scan |
SOPInstanceUID | abc123 | abc456 | def789 | ghi012 | jkl345 |
With the default behavior, this source data would be imported into Flywheel as:
Where the ZIP file contents are:
Note a few nuances:
- Header-based mappings: The destination Flywheel hierarchy structure is based entirely on the DICOM headers and not on any of the source file path information
- Loose organization: There is no rule on how many parent folders a DICOM series can be contained within. Compare the contents of the
Patient1/andPatient2/folders from the source data, for example - No split DICOM series: The only restriction on source data organization for DICOM files is that each DICOM series must be fully contained in exactly 1 leaf-level source folder
- Single-file Handling: DICOM series containing only 1 file are zipped
Example 3: Arbitrary Files (non-DICOM)
Consider the following source data structure:
With the default behavior, this source data would be imported into Flywheel as:
Note a few nuances:
- Container Labels: The source folder names are used as the container labels (e.g., "Patient123" is the Subject label)
- No Zipping: The files are not grouped together and are stored in Flywheel individually as-is
- Arbitrary Files Types: Any type of file can be imported (not only DICOM). The uploaded file will have its type automatically set in Flywheel according to the matching rules for File Types in Flywheel Core
Example 4: Mixed Types
Where the DICOM files contain the following headers:
| DICOM Header | file1.dcm | file2.dcm | file3.dcm | 9572012 | 0012893 |
|---|---|---|---|---|---|
PatientID | Subj123 | Subj123 | Subj123 | Subj456 | Subj456 |
StudyInstanceUID | 1234 | 1234 | 1234 | 5678 | 5678 |
StudyDescription | Timepoint1 | Timepoint1 | Timepoint2 | Timepoint1 | Timepoint1 |
SeriesInstanceUID | 9876 | 9876 | 7654 | 3210 | 3210 |
SeriesNumber | 1 | 1 | 1 | 4 | 4 |
SeriesDescription | Chest X-ray | Chest X-ray | Head CT | PET scan | PET scan |
SOPInstanceUID | abc123 | abc456 | def789 | ghi012 | jkl345 |
With the default behavior, this source data would be imported into Flywheel as:
Where the ZIP file contents are:
Note a few nuances:
- Skipped Files:
scan-notes-1.txtis skipped and not uploaded to Flywheel, because there is "no matching rule"- Specifically, this file is not detected as DICOM and the parent directory structure does not contain enough levels to represent the desired Flywheel hierarchy, and so it is skipped due to not having enough information to map to Flywheel
- Additional Containers: There are 3 Subject containers in the destination Flywheel hierarchy even though the source folder structure appeared to have only 2 patient folders
- This is because the
Patient1/source folder contained some folders with DICOM files and some folders containing non-DICOM files - The DICOM files mapped to
Subj123/based on thePatientIDheader - The non-DICOM files mapped to
Patient1/based on the file path information - To combine these containers into just
Subj123/, the source folder must be renamed toSubj123/to match thePatientIDheader of the contained DICOM files - Note that this same behavior cascades down to the Session and Acquisition containers as well
- This is because the