Bulk Import - Mapping to the Flywheel Hierarchy (19.4+)

Applicability

Version 19.4 and Later

This document applies only to Flywheel version 19.4 and later.

For Flywheel versions 19.2 to 19.3, refer to Bulk Import Mapping Rules (19.2 to 19.3) instead.
For Flywheel version 19.1 or earlier, refer to Bulk Import Mapping Rules (-19.1) instead.

Overview

By default, Flywheel will do its best to automatically detect if the files you are importing are DICOM versus some other arbitrary type of file.

If Flywheel detects that a file is DICOM, it will treat the file accordingly and read the DICOM headers to determine where to place the data.

If Flywheel does not detect that a file is DICOM, it will ignore the file content and solely use the original file path, including parent folder names, to determine where to place the data.

The Default Behavior section of this document explains in much greater detail how these default "mapping rules" are applied. For specific examples, skip ahead to the Examples section.

For a complete example combining all the information in this document, refer to Example 4: Mixed Types.

Mixed File Types

If your dataset contains a mix of both DICOM and non-DICOM files, don't worry -- you can still import it all at once!

Flywheel's automatic file type detection will identify which files are DICOM and handle each file accordingly. Additionally, with user-defined mapping rules, you can combine both DICOM header and file path information.

User-defined Behavior

If you need to change the way your data is organized into Flywheel, consider using new (BETA) CLI, which provides extensive options for User-defined Behavior.

Default Behavior

By default, Flywheel considers the following information from your source data to determine how to organize the data within Flywheel:

File paths, including the following elements:
1. File names
2. File extensions
3. Parent folder structure
DICOM headers (DICOM files only)

At a glance, the logic generally works as follows:

flowchart LR
    dt1(Detect Attachments)
    dt2(Detect DICOM)
    dt1-->dt2
    map1(Derive destination hierarchy)
    dt2-->map1
    grp1("Package files (ZIP)")
    map1-->grp1
    up1(Upload to Flywheel)
    grp1-->up1

The following sections describe this process in much greater detail.

Note

Because of steps 1, 3, and 4, the way the source data is organized before import greatly affects how the data is imported and organized into Flywheel (a.k.a. "mapping").

1. Detecting Container Attachments

It is possible to import files to be stored as attachments to Project, Subject, and Session containers (instead of only as Acquisition contents).

To import files as attachments, simply place the files into the source folder representing the Flywheel container to which the file should be attached.

For example, see Example 1: Mappings for Container Attachments.

2. Detecting DICOM Files

Flywheel will read the source file path information to determine which files might be DICOM.

For the purposes of determining where to place the data, Flywheel will assume that any file which meets both of the following rules is a DICOM file and should be processed accordingly:

File name has either:
1. An extension of *.dcm, *.dicom, or *.ima; OR
2. No extension and solely consists of numerals (e.g., 98201293)
File is located at a leaf-level folder in the source data.
1. I.e., there are no folders next to this file in the source data.

Any file that fails any one of these criteria will not be considered a DICOM file and would be placed purely according to their parent folder structure. Non-DICOM file handling is explained further in the Non-DICOM Files section.

DICOM file type detection is currently limited in the following ways:

File contents are not opened during type detection; only the file path information is used (names, extensions, etc.)
1. DICOMDIR index files are not used
2. "Magic Bytes" are not used
ZIP archives archives are not processed as DICOM (at import time)
1. Archives are not extracted during the import process
2. The *.dicom.zip extension is not detected; only the *.zip extension

Limitation: Pre-zipped DICOM

For the purposes of mapping to the Flywheel hierarchy, ZIP files will not be detected as DICOM even if the file name contains *.dicom.zip. Pre-zipped DICOM files will be handled as non-DICOM files (at import time), and only their source file path information will be used.

This also means DICOM header to Flywheel metadata extraction will not occur at import time for such files.

However, once placed into Flywheel, pre-zipped files whose names end with *.dicom.zip will be classified as DICOM type by Flywheel Core according to Flywheel Core's file type detection rules, thus allowing for DICOM-specific gear rules to be triggered accordingly.

For more information about de-identification support with Bulk Import, refer to the Bulk Import De-Identification documentation.

Limitation: Magic Bytes

Flywheel uses a different, stronger mechanism to detect DICOM files for the purposes of de-identification. In the case of de-identification, Flywheel reads the file content and looks for the File Signature (i.e., "Magic Bytes") indicating if the file is DICOM or not.

The File Signature mechanism is used for de-identification since it is a reliable indicator of file type, and it is critical ensure all DICOM files are de-identified properly. However, this mechanism is not used for file type detection in other processes (e.g., determining file placement) for performance and scalability reasons.

Limitation: DICOMDIR Structure

The DICOMDIR index file is not read or used for DICOM type detection.

Instead, Flywheel makes assumptions about the naming conventions typically used with DICOMDIR structures (e.g., extension-less files with numeric names) to infer if a file may be DICOM.

3. Deriving Destination Hierarchy

3.a. For DICOM Files

For the files detected as DICOM (see DICOM Detection), Flywheel will open the files and read their DICOM header information.

The following sections detail how the DICOM header information will be used to determine both:

Flywheel Hierarchy Mappings (Container Labels)
Flywheel Container Metadata

For a complete example, see Example 2: Mappings for DICOM Files.

Tip

Additional documentation on DICOM header to Flywheel metadata mappings is available in the new (BETA) CLI documentation for the import run command.

DICOM Header to Flywheel Hierarchy Mappings

From DICOM files, Flywheel will generate the container labels for the desired destination Flywheel hierarchy according to the following mappings:

Flywheel Container Label	DICOM Header Tag
`subject.label`	`PatientID`
`session.label`	`StudyDescription` fallback to `session.timestamp` fallback to `StudyInstanceUID`
`acquisition.label`	`SeriesNumber - SeriesDescription` fallback to `SeriesNumber - ProtocolName` fallback to `acquisition.timestamp` (formatted as `%Y-%m-%dT%H:%M:%S`) fallback to `SeriesInstanceUID` and only prefixed if `SeriesNumber` is set
`file.name`	Copied from `acquisition.label`

DICOM Header to Flywheel Metadata Mappings

From DICOM files, Flywheel will generate additional metadata for the desired destination Flywheel hierarchy according to the following mappings:

Flywheel Container Type	Flywheel Metadata Field	DICOM Header Tag
Subject	`subject.firstname`	Split from `PatientName`
Subject	`subject.lastname`	Split from `PatientName`
Subject	`subject.sex`	`PatientSex`
Session	`session.uid`	`StudyInstanceUID`
Session	`session.age`	`PatientAge` (converted to seconds) fallback to difference between `acquisition.timestamp` and `PatientBirthDate`
Session	`session.weight`	`PatientWeight`
Session	`session.operator`	`OperatorsName`
Session	`session.timestamp`	Combination of `StudyDate` and `StudyTime` fallback to combination of `SeriesDate` and `SeriesTime` fallback to `AcquisitionDateTime` fallback to combination of `AcquisitionDate` and `AcquisitionTime` with respect to `TimezoneOffsetFromUTC`
Acquisition	`acquisition.uid`	`SeriesInstanceUID`
Acquisition	`acquisition.timestamp`	`AcquisitionDateTime` fallback to combination of `AcquisitionDate` and `AcquisitionTime` fallback to combination of `SeriesDate` and `SeriesTime` fallback to combination of `StudyDate` and `StudyTime` with respect to `TimezoneOffsetFromUTC`

3.b. For Arbitrary Files (non-DICOM)

For any file that Flywheel does not detect as DICOM, Flywheel uses the source file path information, including parent folder names, to determine how to organize the files within Flywheel. In this case, the files are not opened and their content is ignored.

Since the original file path is used to determine where to place non-DICOM files in Flywheel, such non-DICOM files must be organized carefully before importing into Flywheel:

The "root" folder represents a single project
Each first-level folder (directly inside the Project root) represents a single Subject (i.e., Patient)
Each second-level folder (directly inside the Subject) represents a single Session (i.e., Study)
Each "leaf" (lowest-level) folder (directly inside a Session) represents a single Acquisition
- Each acquisition folder contains in a single "leaf" folder
- Each "leaf" folder contains a single acquisition

For a complete example, see Example 3: Mappings for Arbitrary Files (non-DICOM).

Info

A "leaf-level" folder is any folder that contains only files and not any additional lower-level folders.

4. Grouping & Packaging Files into ZIP Archives (DICOM only)

If an Acquisition contains multiple DICOM files, the files will be grouped together and packaged into a ZIP archive before uploading to Flywheel.

Note the following exceptions:

Acquisitions containing single DICOM files are not packaged into ZIP archives before uploading to Flywheel.
Non-DICOM files are never packaged into ZIP archives are are always uploaded individually.

Limitation: 1 DICOM Series per Input Folder

There are a few requirements on how DICOM files must be organized before upload:

Each "leaf-level" input folder must contain only a single DICOM Series.
Each DICOM Series must be fully contained in a single "leaf-level" input folder.

A "leaf-level" folder is any folder that contains only files and not any additional folders.

Multiple DICOM Series may be uploaded at the same time, so long as they are each located in their own "leaf-level" folder.

5. Uploading Files to Flywheel

Files can be uploaded to the following locations within Flywheel

Project, Subject, or Session attachments
- Review the section on Container Attachments.
Acquisition container contents
- Review the sections on DICOM and Non-DICOM file handling.

If any conflicts or duplication scenarios are encountered while uploading the files to Flywheel, the affected files will be quarantine and flagged for manual review. Refer to the documentation non Bulk Import Conflict Handling for more details.

User-defined Behavior

Currently, user-defined mapping rules can only be used when starting an Import via the new (BETA) CLI.

More information about user-defined mapping options can be found in the new (BETA) CLI documentation for the import run command.

Examples

Example 1: Container Attachments

Consider the following source data structure:

s3://myDataBucket/
├── objectives-1.csv
├── Patient123/
    ├── consent-form-1.pdf
    ├── Study20220423/
        ├── tech-notes-1.txt
        └── Series1/
    └── Study20230122/
└── Patient456/
    └── Study20221103/

With the default behavior, this source data would be imported into Flywheel as:

fw://ACME Research/  (Project)
├── objectives-1.csv  (Project attachment)
├── Patient123/  (Subject)
    ├── consent-form-1.pdf  (Subject attachment)
    ├── Study20220423/  (Session)
        ├── tech-notes-1.txt  (Session attachment)
        └── Series1/  (Acquisition)
    └── Study20230122/  (Session)
└── Patient456/  (Subject)
    └── Study20221103/  (Session)

Note how the following files are imported based on where they were located in the source folder structure:

objectives-1.csv is imported as an attachment to the Destination Project
consent-form-1.pdf is imported as an attachment to the Subject labeled Patient123 in the Destination Project
tech-notes-1.txt is imported as an attachment to the Session labeled Study20220423 of the Subject labeled Patent123 in the Destination Project

Example 2: DICOM Files

Consider the following source data structure:

s3://myDataBucket/
├── Patient1/
    ├── Study20220423/
        └── Series1/
            ├── file1.dcm
            └── file2.dcm
        └── Series2/
            └── file3.dcm
    └── Study20230122/
└── Patient2/
    ├── 9572012
    └── 0012893

Where the four DICOM files, contain the following headers:

DICOM Header	file1.dcm	file2.dcm	file3.dcm	9572012	0012893
`PatientID`	Subj123	Subj123	Subj123	Subj456	Subj456
`StudyDescription`	Timepoint1	Timepoint1	Timepoint2	Timepoint1	Timepoint1
`SeriesNumber`	1	1	1	4	4
`SeriesDescription`	Chest X-ray	Chest X-ray	Chest X-ray	PET scan	PET scan

With the default behavior, this source data would be importing into Flywheel as:

fw://ACME Research/ (Project)
├── Subj123 (Subject)
    └── Timepoint1 (Session)
        └── 1 - Chest X-ray (Acquisition)
            └── "1 - Chest X-ray.dicom.zip" (File)
    └── Timepoint2 (Session)
        └── 1 - Chest X-ray (Acquisition)
            └── "file3.dcm" (File)
└── Subj456 (Subject)
    └──Timepoint1 (Session)
        └── 4 - PET scan (Acquisition)
            └── "4 - PET scan.dicom.zip" (File)

Where the ZIP file contents are:

"1 - Chest X-ray.dicom.zip"
├── file1.dcm
└── file2.dcm
"4 - PET scan.dicom.zip"
├── 9572012
└── 0012893

Note a few nuances:

Header-based mappings: The destination Flywheel hierarchy structure is based entirely on the DICOM headers and not on any of the source file path information.
Loose organization: There is no rule on how many parent folders a DICOM series can be contained within. Compare the contents of the Patient1/ and Patient2/ folders from the source data, for example.
Single DICOM Series per Folder: The only restriction on source data organization for DICOM files is that:
- Each DICOM Series must be fully contained in exactly 1 leaf-level source folder, and
- Each leaf-level source folder must contain at most 1 DICOM Series.
Single-file Handling: DICOM series containing only 1 file are not zipped.

Example 3: Arbitrary Files (non-DICOM)

Consider the following source data structure:

s3://myDataBucket/
├── Patient123/
    ├── Study20220423/
        └── Series1/
            ├── formA.pdf
            ├── report09.csv
            └── ...
    └── Study20230122/
└── Patient456/
    └── Study20221103/

With the default behavior, this source data would be importing into Flywheel as:

fw://ACME Research/ (Project)
├── Patient123/ (Subject)
    ├── Study20220423/ (Session)
        └── Series1/ (Acquisition)
            ├── formA.pdf
            ├── report09.csv
            └── ...
    └── Study20230122/ (Session)
└── Patient456/ (Subject)
    └── Study20221103/ (Session)

Note a few nuances:

Container Labels: The source folder names are used as the container labels (e.g., "Patient123" is the Subject label).
No Zipping: The files are not grouped together and are stored in Flywheel individually as-is.
Arbitrary Files Types: Any type of file can be imported (not only DICOM). The uploaded file will have its type automatically set in Flywheel according to the matching rules for File Types in Flywheel Core.

Example 4: Mixed Types

s3://myDataBucket/
├── objectives-1.csv
├── Patient1/
    ├── consent-form-1.pdf
    ├── Study20220423/
        ├── tech-notes-1.txt
        └── Series1/
            ├── file1.dcm
            └── file2.dcm
        └── Series2/
            └── file3.dcm
        └── Series3/
            ├── formA.pdf
            ├── report09.csv
            └── ...
    └── Study20230122/
└── Patient2/
    ├── scan-notes-1.txt
    ├── 9572012
    └── 0012893

Where the four DICOM files, contain the following headers:

DICOM Header	file1.dcm	file2.dcm	file3.dcm	9572012	0012893
`PatientID`	Subj123	Subj123	Subj123	Subj456	Subj456
`StudyDescription`	Timepoint1	Timepoint1	Timepoint2	Timepoint1	Timepoint1
`SeriesNumber`	1	1	1	4	4
`SeriesDescription`	Chest X-ray	Chest X-ray	Chest X-ray	PET scan	PET scan

With the default behavior, this source data would be importing into Flywheel as:

fw://ACME Research/ (Project)
├── objectives-1.csv  (Project attachment)
├── Subj123/ (Subject)
    ├── consent-form-1.pdf  (Subject attachment)
    ├── Timepoint1/ (Session)
        ├── tech-notes-1.txt  (Session attachment)
        └── 1 - Chest X-ray (Acquisition)
            └── "1 - Chest X-ray.dicom.zip" (File)
    └── Timepoint2/ (Session)
        └── 1 - Chest X-ray (Acquisition)
            └── "file3.dcm" (File)
├── Patient1/ (Subject)
    └── Study20220423/ (Session)
        └── Series1/ (Acquisition)
            ├── formA.pdf
            ├── report09.csv
            └── ...
└── Subj456/ (Subject)
    └── Timepoint1/ (Session)
        └── 4 - PET scan/ (Acquisition)
            └── "4 - PET scan.dicom.zip" (File)

Note a few nuances:

Skipped Files: scan-notes-1.txt is skipped and not uploaded to Flywheel, because there is "no matching rule."
- Specifically, this file is not detected as DICOM and the parent directory structure does not contain enough levels to represent the desired Flywheel hierarchy, and so it is skipped due to not having enough information to map to Flywheel.
Additional Containers: There are 3 Subject containers in the destination Flywheel hierarchy even though the source folder structure appeared to have only 2 patient folders.
- This is because the Patient1/ source folder contained some folders with DICOM files and some folders containing non-DICOM files.
- The DICOM files mapped to Subj123/ based on the PatientID header.
- The non-DICOM files mapped to Patient1/ based on the file path information.
- To combine these containers into just Subj123/, the source folder must be renamed to Subj123/ to match the PatientID header of the contained DICOM files.
- Note that this same behavior cascades down to the Session and Acquisition containers as well.