Bulk Import - Mapping to the Flywheel Hierarchy (19.2 to 19.3)

Applicability

Versions 19.2 to 19.3

This document applies only to Flywheel versions 19.2 to 19.3.

For Flywheel version 19.4 or later, refer to Bulk Import Mapping Rules (19.4+) instead.
For Flywheel version 19.1 or earlier, refer to Bulk Import Mapping Rules (-19.1) instead.

Starting with Flywheel version 19.2, the default mapping rules are combined enabling DICOM and non-DICOM data to be imported simultaneously. In this case, Flywheel applies the appropriate mappings automatically based on the data type detected.

Prior to Flywheel version 19.2, DICOM and non-DICOM data were best imported separately using different default mapping rules.

Overview

The way the source data is organized within the storage bucket affects how the data is imported into Flywheel (a.k.a. "mapping").

There are several options for mapping rules:

Default Mapping Rules
1. DICOM header-based
2. File path-based
User-defined rules using file paths and/or DICOM headers

Default Mapping Rules

DICOM files

If the source data consists of DICOM files, then a simple approach is to use the default "DICOM header"-based mapping rules, which allows for an even simpler way of organizing files:

The "root" folder represents a single project
Each "leaf" (lowest-level) folder represents a single Acquisition
- Each acquisition folder contains in a single "leaf" folder
- Each "leaf" folder contains a single acquisition

Tip

If your source dataset consists of only DICOM files, then there are no specific rules around the intermediate-level folders -- just that each Acquisition needs to be in its own folder.

However, if your dataset contains any additional types of files other than DICOM, then your source data must be structured to match the destination Flywheel hierarchy as described in the section on Non-DICOM files.

For example, consider the following source data structure:

- s3://myDataBucket
    - /Patient123
        - /Study20220423
            - /Series1
                - file1.dcm
                - file2.dcm
        - /Study20230122
    - /Patient456
        - file3.dcm
        - file4.dcm

With the default DICOM header-based mapping rules, this source data would be importing into Flywheel as:

(Destination Project)
- Subject
  - Session
    - Acquisition
      - File: Acq1.dicom.zip
- Subject
  - Session
    - Acquisition
      - File: Acq2.dicom.zip

Where the ZIP file contents are:

Acq1.dicom.zip
- file1.dcm
- file2.dcm
Acq2.dicom.zip
- file3.dcm
- file4.dcm

Note a few things:

Container Labels: The container labels are derived from the DICOM header information, not from the folder names
Zipping: The files are grouped together and are stored in Flywheel as ZIP files
- ZIP File Name: The ZIP files are derived from the DICOM header information, not from the folder names

DICOM Header to Flywheel Metadata Mappings

Flywheel will also extract other DICOM headers beyond just those that are needed for determining where to place the data within the Flywheel hierarchy. For example, PatientAge will be extracted and stored as subject.age within Flywheel.

For the full list of all default DICOM header mappings, refer to the import run - dicom documentation.

Non-DICOM files

If the source data contains any other types of files beyond just DICOM, then file path-based mappings must be used to handle the non-DICOM files.

Mixed file types

If your dataset contains a mix of both DICOM and non-DICOM files, don't worry -- you can still import it all at once!

In this case, your dataset will need to be structured carefully to match the desired Flywheel hierarchy, so that Flywheel can determine where to place the non-DICOM files. This is described in more detail in this section below.

At the same time, Flywheel will still read the DICOM headers to determine where to place the DICOM files and also extract other relevant DICOM headers to store as metadata within Flywheel.

In this case, the simplest approach is to use the default "file path"-based mapping rules, which requires the data to be organized accordingly:

The "root" folder represents a single project
Each first-level folder (directly inside the Project root) represents a single Subject (i.e., Patient)
Each second-level folder (directly inside the Subject) represents a single Session (i.e., Study)
Each "leaf" (lowest-level) folder (directly inside a Session) represents a single Acquisition
- Each acquisition folder contains in a single "leaf" folder
- Each "leaf" folder contains a single acquisition

For example, consider the following source data structure:

- s3://myDataBucket
    - /Patient123
        - /Study20220423
            - /Series1
                - formA.pdf
                - report09.csv
                - ...
        - /Study20230122
        - ...
    - /Patient456
        - /Study20221103
    - /...

With the default file path-based mapping rules, this source data would be importing into Flywheel as:

(Destination Project)
- Subject: "Patient123"
  - Session: "Study20220423"
    - Acquisition: "Series1"
      - File: formA.pdf
      - File: report09.csv
      - ...
  - Session: "Study20230122"
  - ...
- Subject: "Patient456"
  - Session: "Study20221103"
- ...

Note a few nuances:

Container Labels: The source folder names are used as the container labels (e.g., "Patient123" is the Subject label).
No Zipping: The files are not grouped together and are stored in Flywheel individually as-is.
Arbitrary Files Types: Any type of file can be imported (not only DICOM). The uploaded file will have its type automatically set in Flywheel according to the matching rules for File Types in Flywheel Core.

Container Attachments (Project, Subject, Session)

It is also possible to import additional files to be stored as attachments to Project, Subject, and Session containers.

To do this, simply place the files into the source folder representing the Flywheel container to which the file should be attached.

For example, consider the following source data structure:

- s3://myDataBucket
    - objectives-1.csv
    - /Patient123
        - tech-form-1.pdf
        - /Study20220423
            - study-notes-1.txt
            - /Series1
                - formA.pdf
                - report09.csv
                - ...
        - /Study20230122
        - ...
    - /Patient456
        - /Study20221103
    - /...

With the default file path-based mapping rules, this source data would be importing into Flywheel as:

(Destination Project)
- objectives-1.csv (Project attachment)
- Subject: "Patient123"
  - consent-form-1.pdf (Subject attachment)
  - Session: "Study20220423"
    - tech-notes-1.txt (Session attachment)
    - Acquisition: "Series1"
      - File: formA.pdf
      - File: report09.csv
      - ...
  - Session: "Study20230122"
  - ...
- Subject: "Patient456"
  - Session: "Study20221103"
- ...

Note how the following files are imported based on where they were located in the source folder structure:

objectives-1.csv is imported as an attachment to the Destination Project
consent-form-1.pdf is imported as an attachment to the Subject labeled Patient123 in the Destination Project
tech-notes-1.txt is imported as an attachment to the Session labeled Study20220423 of the Subjecte labeled Patent123 in the Destination Project

User-defined Mapping Rules

More information about user-defined mapping options can be found in the new (BETA) CLI docs for the import run command.