How to Run a Bulk Import
Instructions
1. Prepare source data
The first step is to decide how what information will be used to determine how the source will be mapped to the Flywheel hierarchy.
If your data consists of only DICOM files, the easiest option is to use the default DICOM header-based mapping rules and let Flywheel do the rest of the work.
If your data consists of files other than DICOM, then you may only be able to use the folder structure and file names (i.e., file path information) and must carefully structure your source dataset accordingly to ensure correct mapping.
Refer to the documentation on Mapping to the Flywheel Hierarchy for more details of the various options for prepare your source data for mapping to the Flywheel Hierarchy.
2. Register external storage
Before an Import can be started, Flywheel must first be configured with information about where to find the source data and how to access it. This is done by creating a new "External Storage" within Flywheel for the bucket location where the source data is stored.
Follow the documentation explaining how to create a new "External Storage" registration in Flywheel.
3. Start the import
There are two ways to begin the import, and the decision primarily depends upon:
- Where the source data is located, and
- How complicated the filtering and mapping rules need to be.
The two options are:
- Via the Web Browser using default mapping rules
- View the New CLI using user-defined (non-default) mapping rules
Option 1 is the easiest and quickest option, and it is currently the only option available for uploading data from your local machine.
Option 2 is best reserved for advanced use cases where complex or user-defined mapping and filtering rules are needed. However, option 2 currently only supports importing data from an external storage, and not from your local machine.
Tip
If the source data is stored on your local machine, you can use the new uploader in the Flywheel Core Web App to upload the data.
The new uploader will temporarily stage the uploaded data into a pre-configured cloud object storage location and then trigger an import from that location.
The option for uploading data from your local machine is not currently available via the new CLI.
Option 1: Via a Web Browser using default mapping rules
If you are using the default mapping rules, whether DICOM header-based or file path-based, then the easiest option is to start the import directly from the Flywheel Core Web App.
The option to start a new import is located on the Imports tab of the details page of the destination project:
- Open the Flywheel Core Web App.
- Navigate to the destination Project to which the data should be imported.
-
Navigate to the "Imports" tab of the destination Project (in the overflow menu).
-
On the Imports tab, select the "Add Data" button to launch the "Add Data" dialog.
-
Select whether you would like to "Import from External Storage" or "Upload and Import from my computer".
-
Select "Next".
-
If uploading from your computer, follow this step. Otherwise, skip to step 8.
-
Upload the data directly from your local machine (either by dragging-and-dropping or using the folder browser).
-
Expand the "Uploading Files" section to monitor the progress of the upload and confirm all files are uploaded successfully.
-
If any file fails to upload, you may either:
- Use the "Retry Upload" option to retry uploading of that individual file, or
-
Use the "Cancel Upload" option to skip uploading of that individual file.
-
-
If importing from an external storage, follow this step. Otherwise, skip to step 9.
-
Select the External Storage containing the source data to be imported.
Tip
If the desired storage location is not available, refer to the "Register external storage" step.
-
-
Choose your mapping rules, either "DICOM Headers" or "File Paths". Review the Mapping Rules documentation for more information.
- Select "Start Import" (or "Complete Upload") to start importing the data into Flywheel.
Option 2: Via the new CLI using user-defined (non-default) mapping rules
If any sort of non-default mapping rules will be used, then the new (BETA) CLI will be needed to start the import.
The options for specifying user-defined mappings are extensive and described in more detail in the new (BETA) CLI documentation for the import run
command.
Prerequisites
If the new (BETA) CLI is not already installed, follow the new (BETA) CLI installation instructions to install it.
An API key will be needed to sign in to the new (BETA) CLI and run commands. This can be the exact same API key used for the Legacy CLI.
If a new API key is needed, follow the documentation for creating a user API key.
Login using the new (BETA) CLI by running the following command:
The CLI will then prompt for the API key. Enter your API key when prompted.
Locate the ID of the External Storage containing the source data by running the following command:
The CLI will then print out a list of the most recently-created storages. If your storage is not listed, it may be older and on a later page.
To view more storages, take the ID of the last storage in the list, then run the following command:
Where <id>
is replaced with the ID of the last storage in the list.
This will display the next set of storages.
There are also other options for filtering, sorting, and changing the length of the list to help locate a particular External Storage. These options can be listed using the following command:
Once you have the ID of the External Storage containing the source data, start the import using the following command:
Where:
<project>
is replaced with the Flywheel Hierarchy of the destination project (e.g.,fw://demo/Alzheimers
)- Make sure to wrap this value in double-quotation marks if it contains any spaces (e.g.,
"fw://flywheel/Brain Tumor Progression"
)
- Make sure to wrap this value in double-quotation marks if it contains any spaces (e.g.,
<storage>
is replaced with the ID of the External Storage containing the source data
Once the import begins, the progress of the import will be displayed in the CLI output.
You can exit out of this progress display by pressing Ctrl+c
or by simply closing the terminal window altogether. This has no effect on the import job itself -- the import will still continue to run.
4. Monitor the Import
Via a Web Browser
Once an import has been started, it can be monitored on the Imports tab of the destination Project.
- Open the Flywheel Core Web App
- Navigate to the destination Project to which the data should be imported
-
Navigate to the "Imports" tab of the destination Project (in the overflow menu)
-
On the Imports tab, locate the import job in the list
A brief summary of the import job can be viewed in-line on project imports page, including the following information:
- Label (name)
- Data source (External Storage)
- User who started the import job
- How much time has passed since the import job was started
- Current status of the import job
- Overflow menu with more actions
More details about the progress of the import job are available in the "Details" dialog, which can be found from the overflow menu on the import job line.
Via the New (BETA) CLI
Alternatively, you can monitor the import via the CLI using the following procedure.
If you have closed the terminal but still want to monitor the import from it, you can reopen the progress display. To do this, you will first need to locate the import job ID by running the following command:
Similar to the fw-beta admin storages list
command, the fw-beta import list
command also has similar options for filtering, sorting, skipping, and changing the length of the list to help locate a particular import job. Append the -h
flag to the command to see these options.
Once you have the import job ID, run the following command to resuming monitoring the progress:
Where <id>
is replaced with the ID of the import job to be monitored.
You can exit out of this progress display by pressing Ctrl+c
or by simply closing the terminal window altogether. This has no effect on the import job itself -- the import will still continue to run.
5. Cancelling a Running Import
If you need to cancel an import job early for any reason, you may do so either via a web browser of via the new (BETA) CLI.
Via a Web Browser
Any in-progress import can be cancelled from the Project Imports page by selecting the "Cancel Import" option in the overflow menu on the import job line.
Cancelling an import job may take some time depending upon the size of the import job. This is primarily because the import process is highly parallelized and uses multiple concurrent processes that coordinate via a task scheduling system.
After selecting "Cancel Import", the import job will first enter the "Cancelling" state, which indicates the system is currently working to cancel all pending tasks and finish all in-progress tasks that are part of the import job.
After all tasks are either completed or cancelled and all further processing is terminated, the import job will transition to the "Cancelled" state.
Via the new (BETA) CLI
To cancel an import via the new (BETA) CLI, use the following command:
Where <id>
is replaced with the ID of the import job to be cancelled.
Troubleshooting
Viewing the audit report for an import job
After an import job terminates for any reason (whether successful, cancelled, or failed), an audit report is made available.
The audit report contains details of what action was taken for every single source file discovered in the source data location, including any errors that may have occurred. The audit report can be extremely helpful in troubleshooting issues, and it can also be useful for maintaining a record of exactly where each file came from and where it was placed in Flywheel.
The audit report can be found by navigating to the Project Imports page, locating the import job line, and then either:
- Selecting the "Download Report" option in the overflow menu on the import job line, or
- Opening the Import Job Details dialog from the overflow menu on the import job line and then selecting the "Download Report" button
The audit report is available in either CSV or JSON Lines format.
- The CSV format can be opened in any standard spreadsheet software and may be the simplest option.
- The JSON Lines format is best when reading the report programmatically.
The CSV version of the audit report contains the following columns:
status
-- Indicates whether the file was successfully imported, skipped, or if an error was encounteredreason
-- Reason for why the file was not imported (empty if the file was successfully imported)src_path
-- Path (including name) of the file in the Source Data Location (External Storage)dst_path
-- Path to where this file should be placed in the Flywheel Hierarchyfile_id
-- ID of the Flywheel File object in which the source file should be placedversion
-- Version of the Flywheel File object in which the source file should be placedsession_uid
-- UID (DICOM) of the Flywheel Session in which the source file should be placedacquisition_uid
-- UID (DICOM) of the Flywheel Acquisition in which the source file should be placed
The JSON Lines version of the audit report contains far more technical details about how the system processed each file.
Here is an example of just one line from an import audit report in JSON Lines format describing how the system processed a single source file:
{
"_id": "653986a89515294c50db0772",
"src_path": "1.3.6.1.4.1.14519.5.2.1.72111832425535404540752357374191117693/04d4cc1b-1382-4e4b-8692-f166d993b5dd.dcm",
"src_stat": {
"type": "gs",
"path": "1.3.6.1.4.1.14519.5.2.1.72111832425535404540752357374191117693/04d4cc1b-1382-4e4b-8692-f166d993b5dd.dcm",
"size": 101292,
"hash": "CPOaofevrYEDEAE=",
"created": 1694806961.576,
"modified": 1694806961.576
},
"status": "skipped",
"dst_path": "UPENN-GBM-00001/BRAIN^ROUTINE/2 - t2_Flair_axial_ Processed_CaPTk/2 - t2_Flair_axial_ Processed_CaPTk.dicom.zip",
"dst_stat": {
"subject": {
"_id": "653978aa0473cf5cf526438e",
"label": "UPENN-GBM-00001",
"upsert": "update"
},
"session": {
"_id": "653978b00473cf5cf5264589",
"uid": "1.3.6.1.4.1.14519.5.2.1.325722981077189157104874710559665333106",
"label": "BRAIN^ROUTINE",
"upsert": "update"
},
"acquisition": {
"_id": "653978b00473cf5cf526458f",
"uid": "1.3.6.1.4.1.14519.5.2.1.72111832425535404540752357374191117693",
"label": "2 - t2_Flair_axial_ Processed_CaPTk",
"upsert": "skip"
},
"file": {
"_id": {
"file_id": "653978b0c747a769272207f2",
"version": 1
},
"name": "2 - t2_Flair_axial_ Processed_CaPTk.dicom.zip",
"size": 6077738,
"upsert": "skip"
}
},
"file_id": "653978b0c747a769272207f2",
"file_version": 1,
"reason": "File already exists"
}