Skip to content

Logo Logo

BIDS Curation Tutorial Part 1: Preparing Data

Introduction

Our main objective in these tutorials is to curate a dataset so that it meets the BIDS specification. Flywheel provides a gear, BIDS Curation, that does this. Before we can run this gear, however, we must ensure that data are in the right format. In this article, we will use two different sample datasets to demonstrate how to prepare data for the BIDS Curation gear.

Instruction Steps

Step 1: Download example data

This tutorial will use two different datasets. One which does not require relabeling and one that does.

Dataset not requiring relabeling (Dataset 1):

  1. Download the example data from the Princeton Handbook for Reproducible Neuroimaging. (Citations. dataset: https://doi.org/10.5281/zenodo.3727775; handbook: https://doi.org/10.5281/zenodo.3688788)

  2. Unzip the folder.

  3. In a Terminal or Windows command prompt, navigate to the unzipped folder and from inside the folder run the below command to unzip the individual DICOM files:

    gunzip dcm/*.gz
    

Dataset requiring relabeling (Dataset 2):

  1. Download the example data associated with the Dcm2Bids Tutorial.

  2. Click on Code and then on Download ZIP. Unzip the folder.

Step 2: Upload data to Flywheel

  1. If it has not yet been done, create one group and two projects where each dataset will be uploaded.
  2. Open Terminal or Windows command prompt
  3. Use the fw ingest command to upload the data. Learn more about how to use the ingest dicom command.

    fw ingest dicom [path_to_in_folder] [GROUP_ID] [PROJECT_LABEL] [--verbose]
    
  4. When prompted with Confirm upload? (yes/no), type y to continue with the upload.

    Uploading Dataset 1 with --verbose option.
    user@user-MBP BIDS-tutorial % fw ingest dicom 0219191_mystudy-0219-1114 bids BIDS-tutorial-test-no-precuration-2 --verbose
    Created             [2024-04-02 12:27:46]
    Configuring         [2024-04-02 12:27:46]
    100.0% (1s)
    Scanning            [2024-04-02 12:27:47]
    2663/2663 files, 1.1GB (5s)
    Resolving           [2024-04-02 12:27:52]
    100.0% (1s)
    In review           [2024-04-02 12:27:53]
    Hierarchy:
    Maximum 100 containers are displayed.
    `- bids (1.1GB) (using)
       `- BIDS-tutorial-test-no-precuration-2 (1.1GB) (creating)
          `- 0219191_greenEyes (1.1GB) (creating)
             `- 2019-02-19 11_14_35 (1.1GB) (creating)
                |- 1 - anat_ses-01_scout (25MB / 1 file) (creating)
                |- 10 - func_ses-01_task-story_run-02 (213MB / 1 file) (creating)
                |- 11 - func_ses-01_task-story_run-03 (213MB / 1 file) (creating)
                |- 12 - func_ses-01_task-story_run-04 (213MB / 1 file) (creating)
                |- 13 - func_ses-01_task-faces_run-01 (86MB / 1 file) (creating)
                |- 2 - anat_ses-01_scout_MPR_sag (1.0MB / 1 file) (creating)
                |- 3 - anat_ses-01_scout_MPR_cor (607KB / 1 file) (creating)
                |- 4 - anat_ses-01_scout_MPR_tra (607KB / 1 file) (creating)
                |- 5 - anat_ses-01_T1w (52MB / 1 file) (creating)
                |- 6 - func_ses-01_task-sound_run-01 (1.3MB / 1 file) (creating)
                |- 7 - func_ses-01_task-sound_run-01 (5.3MB / 1 file) (creating)
                |- 8 - func_ses-01_task-sound_run-01 (79MB / 1 file) (creating)
                `- 9 - func_ses-01_task-story_run-01 (213MB / 1 file) (creating)
      Groups: 1
      Projects: 1
      Subjects: 1
      Sessions: 1
      Acquisitions: 13
      Files: 0
      Packfiles: 13
    Confirm upload to latest.sse.flywheel.io? (yes/no): y
    Preparing           [2024-04-02 12:28:24]
    100.0% (3s)
    Uploading           [2024-04-02 12:28:27]
    100.0% - (0 failed) (19s)
    Total: 13
    Finalizing          [2024-04-02 12:28:46]
    100.0% (1s)
    Finished            [2024-04-02 12:28:47]
    Final report
    Total elapsed time: 1m 1s
    
    Uploading Dataset 2 with --verbose option.
    user@user-MBP BIDS-tutorial % fw ingest dicom dcm_qa_nih-master bids BIDS-tutorial-test-precuration-2 --verbose
    Created             [2024-04-03 09:19:39]
    Configuring         [2024-04-03 09:19:39]
    100.0% (1s)
    Scanning            [2024-04-03 09:19:40]
    610/610 files, 16MB (0s)
    Resolving           [2024-04-03 09:19:40]
    100.0% (2s)
    In review           [2024-04-03 09:19:42]
    Hierarchy:
    Maximum 100 containers are displayed.
    `- bids (16MB) (using)
       `- BIDS-tutorial-test-precuration-2 (16MB) (using)
          `- DEV (16MB) (using)
             |- 2018-09-18 11_40_23 (14MB) (creating)
             |  |- 4 - Axial EPI-FMRI (Interleaved I to S) (3.6MB / 1 file) (creating)
             |  |- 5 - Axial EPI-FMRI (Sequential I to S) (3.6MB / 1 file) (creating)
             |  |- 6 - Axial EPI-FMRI (Interleaved S to I) (3.6MB / 1 file) (creating)
             |  `- 7 - Axial EPI-FMRI (Sequential S to I) (3.6MB / 1 file) (creating)
             `- 2018-09-18 12_12_29 (1.6MB) (creating)
                |- 3 - EPI PE=AP (421KB / 1 file) (creating)
                |- 4 - EPI PE=PA (422KB / 1 file) (creating)
                |- 5 - EPI PE=RL (422KB / 1 file) (creating)
                `- 6 - EPI PE=LR (422KB / 1 file) (creating)
      Groups: 1
      Projects: 1
      Subjects: 1
      Sessions: 2
      Acquisitions: 8
      Files: 0
      Packfiles: 8
    Confirm upload to latest.sse.flywheel.io? (yes/no): y
    Preparing           [2024-04-03 09:19:54]
    100.0% (1s)
    Uploading           [2024-04-03 09:19:55]
    100.0% - (0 failed) (4s)
    Total: 8
    Finalizing          [2024-04-03 09:19:59]
    100.0% (0s)
    Finished            [2024-04-03 09:19:59]
    Final report
    Total elapsed time: 20s
    
  5. After data are uploaded, check Flywheel to verify the structure.

    • Sign in to Flywheel.
    • Navigate to the group/project.
    • Click on the Sessions tab, and then click on the Subject icon SubjectViewIcon.png
    Dataset 1 file structure

    Note: Subject "0219191_greenEyes" has one session and all acquisitions 001-Dataset1-File-Structure.png

    Dataset 2 file structure

    Note: Subject "DEV" has two sessions and all acquisitions 001-Dataset2-File-Structure.png

Step 3: Run the File Metadata Importer gear on all acquisitions

Now that the data are in Flywheel, we will continue preparing it there. The first step is to run the file-metadata-importer gear. This gear reads in the DICOM header information and indexes it in Flywheel.

Dataset 1: Running File Metadata Importer gear

Since Dataset 1 only has one session, the File Metadata Importer gear needs to be run acquisition-by-acquisition. (Note. When ingesting data for a real study, it is possible to set up a pipeline to avoid having to manually run this gear acquisition-by-acquisition.)

  1. Click on the Sessions tab.
  2. Click on the blue Run Gear button in the upper right and select Utility Gear. 001-Dataset1-file-metadata-importer
  3. Search for File Metadata Importer and select the latest version. 001b-Dataset1-file-metadata-importer
  4. Click on input file box and select the *dicom.zip file for acquisition 1. 001c-Dataset1-file-metadata-importer
  5. Click Run Gear. 001d-Dataset1-file-metadata-importer
  6. Click on the Provenance tab and use the Refresh button to check the status of the jobs. 001e-Dataset1-file-metadata-importer
  7. Repeat steps 2-6 for the remaining twelve acquisitions.
  8. While the File Metadata Importer gear should run without errors, it is good practice to check the log files for any obvious errors. 001f-Dataset1-file-metadata-importer
Dataset 2: Batch running File Metadata Importer gear

As Dataset 2 has more than one session, the File Metadata Importer gear can be run using batch mode. (Note. When ingesting data for a real study, it is possible to set up a pipeline to avoid having to manually run this gear.)

  1. Click on the Sessions tab.
  2. Go to the Actions menu on the top-left, choose Batch Run Gear, and select Utility Gear. 001-Dataset2-file-metadata-importer
  3. Search for File Metadata Importer and select the latest version. 001b-Dataset2-file-metadata-importer
  4. Keep the default configurations and click on Run Gear. 001c-Dataset2-file-metadata-importer
  5. Confirm that eight jobs will run. 001d-Dataset2-file-metadata-importer
  6. Click the Provenance tab. If the jobs are pending, click Refresh. 001e-Dataset2-file-metadata-importer
  7. While the File Metadata Importer gear should run without errors, it is good practice to check the log files for any obvious errors. 001f-Dataset2-file-metadata-importer

Step 4: Run the File Classifier gear on all acquisitions

Now that the metadata are indexed in Flywheel, we can continue on to the next step. Run the file-classifier The gear uses the metadata indexed in the previous step to determine and create classification metadata for each file.

Dataset 1: Running File Classifier gear

Since Dataset 1 only has one session, the File Metadata Importer gear needs to be run acquisition-by-acquisition. (Note. When ingesting data for a real study, it is possible to set up a pipeline to avoid having to manually run this gear acquisition-by-acquisition.)

  1. Click on the Sessions tab.
  2. Click on the blue Run Gear button in the upper right and select Utility Gear. 001-Dataset1-file-metadata-importer
  3. Search for File Metadata Importer and select the latest version. 002b-Dataset1-file-classifier
  4. Click on input file box and select the *dicom.zip file for acquisition 1. 001c-Dataset1-file-metadata-importer
  5. Click Run Gear. 002d-Dataset1-file-classifier
  6. Click on the Provenance tab and use the Refresh button to check the status of the jobs. 002e-Dataset1-file-classifier
  7. Repeat steps 2-6 for the remaining twelve acquisitions.
  8. While the File Classifier gear should run without errors, it is generally good practice to check the log files for any obvious errors. 002f-Dataset1-file-classifier
  9. In addition to checking the log files for errors, it is also imporant to check the classification results themselves. From the Acquisitions tab under Sessions, click on the Information icon to the right-hand side of first acquisition in the list: 1 - anat_ses-01_scout. 002g-Dataset1-file-classifier
  10. Of particular importance for the Relabel Container and BIDS Curate gears is that the Intent field is set correctly. We can see for Dataset 1, that the scout images were mis-classified as Structurals. This is because the File Classifier gear uses information in the DICOM header rather than acquisition labels, so there can be mismatches between the automated classification of an acquisition and your actual intent for the acquisition. 002h-Dataset1-file-classifier
  11. To correct this, click on the Intent pulldown menu, deselect Structural and select Localizer. Then click Save. 002i-Dataset1-file-classifier
  12. Repeat steps 9-11 for the remaining three acquisitions labeled scout (numbers 2-4). Since all of the functional task acquisitions were correctly labeled with Functional as the Intent, and the one 3D T1w image was correctly labeled with Structural as the Intent, no further corrections are necessary.
Dataset 2: Batch running File Classifier gear

As Dataset 2 has more than one session, the File Metadata Importer gear can be run using batch mode. (Note. When ingesting data for a real study, it is possible to set up a pipeline to avoid having to manually run this gear.)

  1. Click on the Sessions tab.
  2. Go to the Actions menu on the top-left, choose Batch Run Gear, and select Utility Gear. 001-Dataset2-file-metadata-importer
  3. Search for File Metadata Importer and select the latest version. 002b-Dataset2-file-classifier
  4. Set Optional Files to Flexible and click Run Gear. 002c-Dataset2-file-classifier
  5. Confirm that eight jobs will run. 002d-Dataset2-file-classifier
  6. Click the Provenance tab. If the jobs are pending, click Refresh. 002e-Dataset2-file-classifier
  7. While the File Classifier gear should run without errors, it is generally good practice to check the log files for any obvious errors. 002f-Dataset2-file-classifier
  8. In addition to checking the log files for errors, it is also imporant to check the classification results themselves. From the Acquisitions tab under Sessions, we can see that for the first session in the list, all of the acquisitions are labeled as both Localizer and Functional. 002g-Dataset2-file-classifier
  9. Of particular importance for the Relabel Container and BIDS Curate gears is that the Intent field is set correctly. We need to make sure for the first session that the Intent is set only as Functional. Click on the Information icon to the right of the acquisition name. 002h-Dataset2-file-classifier
  10. To correct this, click on the Intent pulldown menu, deselect Localizer and click Save. 002i-Dataset2-file-classifier
  11. Repeat steps 10-11 for the remaining three acquisitions.
  12. Navigating to the second session in the list, we can see that the classification is just 3T and 2D. 002j-Dataset2-file-classifier
  13. Clicking on the information icon for the first scan, we see that there is no Intent set. 002k-Dataset2-file-classifier
  14. Before we can set the Intent, in this case, we need to clear 2D from Features. 002l-Dataset2-file-classifier
  15. Then, we can set the Intent to Functional. For completeness, we can also set Measurement to T2* and then click Save. 002m-Dataset2-file-classifier 002n-Dataset2-file-classifier

File Classifier Intents for successful BIDS Curation

When checking the results of the File Classifier gear, make sure that acquisitions to be included in the BIDS anat/ folder have an Intent of Structural, func/as Functional, dwi/ as Structural and Measurement as Diffusion, and fmap/ as Fieldmap.

Step 5: Run the dcm2niix gear on all acquisitions

The last step is to prepare the datasets for Relabeling and BIDS Curation is to run the dcm2niix gear. This will take the DICOM zip archives and create NIfTI files.

Dataset 1: Running dcm2niix gear

Since Dataset 1 only has one session, the dcm2niix gear needs to be run acquisition-by-acquisition. (Note. When ingesting data for a real study, it is possible to set up a pipeline to avoid having to manually run this gear acquisition-by-acquisition.)

  1. Click on the Sessions tab.
  2. Click on the blue Run Gear button in the upper right and select Utility Gear. 001-Dataset1-file-metadata-importer
  3. Search for dcm2niix and select the latest version. 003b-Dataset1-dcm2niix
  4. Click on input file box and select the *dicom.zip file for acquisition 1. 001c-Dataset1-file-metadata-importer
  5. Click Run Gear. 003d-Dataset1-dcm2niix
  6. Click on the Provenance tab and use the Refresh button to check the status of the jobs. 003e-Dataset1-dcm2niix
  7. Repeat steps 2-6 for the remaining twelve acquisitions.
  8. While the dcm2niix gear should run without errors, it is good practice to check the log files for any obvious errors. 003f-Dataset1-dcm2niix
Dataset 2: Batch running dcm2niix gear

As Dataset 2 has more than one session, the dcm2niix gear can be run using batch mode. (Note. When ingesting data for a real study, it is possible to set up a pipeline to avoid having to manually run this gear.)

  1. Click on the Sessions tab.
  2. Click on the blue Run Gear button in the upper right and select Utility Gear. 001-Dataset1-file-metadata-importer
  3. Search for dcm2niix and select the latest version. 003b-Dataset1-dcm2niix
  4. Click on input file box and select the *dicom.zip file for acquisition 1. 001c-Dataset1-file-metadata-importer
  5. Click Run Gear. 003d-Dataset1-dcm2niix
  6. Click on the Provenance tab and use the Refresh button to check the status of the jobs. 003e-Dataset1-dcm2niix
  7. Repeat steps 2-6 for the remaining twelve acquisitions.
  8. While the dcm2niix gear should run without errors, it is good practice to check the log files for any obvious errors. 003f-Dataset1-dcm2niix

Finishing up with Part 1

Gear Rules

In steps 3 - 5, we manually ran three gears on all acquisitions, but you can automate this process by creating Gear Rules for your project. Learn more about how to configure Gear Rules. Most projects will be set up from the beginning with gear rules to run the file-metadata-importer, file-classifier, and dcm2niix gears so that this initial processing will automatically begin when new data appears.

If following along with Dataset 1, continue on to BIDS Curation Tutorial Part 2: Running Relabeling to learn about BIDS and the ReproIn-compliant naming scheme this dataset employs at the scanner. If you are already familiar with BIDS and ReproIn, feel free to jump directly to BIDS Curation Tutorial Part 3: Running Curation

If following along with Dataset 2, continue on to BIDS Curation Tutorial Part 2: Running Relabeling to learn about BIDS, ReproIn, and how to use the Relabel Container gear to update acquisition labels to ReproIn-compliant labels.