Flywheel + BIDS: Getting Started
Introduction
Brain Imaging Data Structure (BIDS) is a codified way of organizing neuroimaging and behavioral experimental data for the purpose of sharing. If you are new to the BIDS standard, we strongly recommend reviewing the Common Principles on the BIDS website or downloading a PDF of the BIDS specification.
Instruction Steps
There are two options for getting BIDS-formatted data on Flywheel: upload data that is already in the BIDS format, or curate data on Flywheel to fit the specification. This article gives an overview of how to complete BIDS curation as smoothly as possible.
Note: While the BIDS curation process is not difficult, setting up the process will take time, attention to detail, and will require input from someone who understands how and why each scan was acquired (for example, someone who knows the study design.) Overview of BIDS curation on Flywheel
To begin, you'll want to take a look at the BIDS Study Design Spreadsheet. This will help you plan out how to get your data into BIDS. The Curation Tutorial on Preparing Data is also an excellent resource.
Next, you will use 4 or 5 gears for BIDS curation:
- File Metadata Importer: Interprets DICOM header information for a basic understanding of the type of scan in Flywheel
- File Classifier: Determines the classifications based on the metadata imported by the File Metadata Importer
- dcm2niix: Converts DICOM files into NIfTI files and set Flywheel metadata with DICOM tag values
- relabel-container (optional): Two-stage gear that generates a spreadsheet with names for the user to edit and then applies the name changes to the data in Flywheel. This optional step eliminates the need to modify templates for the curate-bids Gear. Additionally, the pre-curation step is not necessary if the names of the images coming from the scanner follow the ReproIn convention (learn more about using the ReproIn convention for new images below)
- curate-bids: Uses a template to set BIDS metadata in Flywheel. This can be run after all scans have been turned into NIfTI files in dcm2niix or after the relabel-container Gear. The curate-bids Gear must be run regardless of the optional relabel-container step.
To successfully run the curate-bids gear, your data must be labeled in the way that the curate-bids gear expects. How you update the labels depends on if you are obtaining new data uploaded directly from the scanner or if you are working with Retrospective data.
You can create gear rules for the first two Gears, so they are executed automatically whenever new scans appear.
New data uploaded directly from the scanner
For new data that will be acquired, the best way to begin is by setting the proper names at the scanner console. For this, we highly recommend the ReproIn naming convention.
When the ReproIn naming convention is used, the BIDS Curation gear can be used with the default ReproIn template, and BIDS curation will be almost automatic.
Learn more about the ReproIn naming convention or take a look at a scanner walkthrough.
Retrospective Data
For retrospective data that has already been acquired, use the relabel-container gear to rename Flywheel acquisition, session, and subject code/labels so that they match the ReproIn convention. Then you can run the BIDS Curation gear with the ReproIn template (learn more about the ReproIn naming convention.)
There is often hesitation at relabeling data because it might obscure provenance, but rest assured that relabeling is safe on Flywheel because the raw information is not changed. The original acquisition name can be retrieved from the SeriesDescription DICOM tag. The session label can be retrieved from the StudyDescription and the subject label/code is typically the Patient ID or Additional Info fields (check with the Flywheel Site Admin at your institution for the specific fields configured for your site.) The relabel-container gear also allows you to mark specific files to be ignored by adding “_ignore-BIDS” to the end of an acquisition label. Flywheel will then skip all files in that acquisition when exporting data in BIDS format.
Note
If you only want to change a small number of labels, this can be done manually in Flywheel, but if a large number of changes need to be made, the relabel-container gear should be used. BIDS Curation
When your acquisitions have the proper labels, whether they were set at the scanner or were relabeled, you will run the curate-bids gear to align your files with the BIDS specification. The BIDS Curation gear walks the Flywheel hierarchy (session, subject, or entire project) and matches files with specific parts of the BIDS specification using rules and definitions in a project curation template. The definitions in this template establish the structure of the BIDS path and file names, while the rules are used to recognize files and then extract parts of names to determine each file's complete BIDS path and name. You can learn more about Flywheel's project curation template in our article.
After running the BIDS Curation gear, you must check the results. This is because even though the gear completes successfully, it does not mean that the BIDS curation is complete and perfect. The gear produces curation reports in the form of spreadsheets that summarize the mapping between the original names to the final BIDS paths and filenames.
BIDS Curation Report
The spreadsheets produced by the curate-bids Gear are:
- {group}_{project}_niftis.csv: A list of the original information (acquisition name, file name, series number, etc.) and the final BIDS path/filename, along with an indication if the path/filename is duplicated, which will result in an error for the BIDS Validator. BIDS Apps run the BIDS Validator and will error-out if there is a problem so the gear won’t run. This spreadsheet should be checked to see if all of the files have been properly recognized or ignored, even if there is a green check mark by the curation gear run.
- {group}_{project}_acquisitions.csv: is useful when there are multiple subjects because it shows the “usual count” of the acquisitions for all subjects and then lists the subjects that have these expected acquisitions and the ones that do not. It lists warnings for unexpected numbers of specific acquisitions and errors for subjects that do not have the expected number of the usual acquisitions.
- {group}_{project}_acquisitions_details_1.csv (_2.csv): lists all of the unique acquisition labels along with the number of times they have been seen. It also provides additional details that should help understand which subjects have missing or additional acquisitions.
- {group}_{project}_intendedfors.csv: lists the field maps and then the paths to the files that those maps are going to be used to correct. If IntendedFor regular expression pairs are provided, it will list the mapping provided by processing using the project curation template as the “before” results and also the after using the regexes to trim down those results.
Problems
If all acquisitions have been labeled correctly and each subject and session has all of the necessary acquisitions then your data is ready to be processed by a BIDS application. However, it is not likely that your data will be perfectly curated on the first try. Subjects move, scans need to be re-started, the wrong MRI sequence is chosen: many common problems will occur, especially in large projects, and this leads to repeated or missing acquisitions. How to interpret the curation report spreadsheets and deal with errors is described in detail in part 4 of this tutorial.
You can also run a script to generate these reports. The gear (and script) can take a list of pairs of regular expressions for fine-tuning the mapping between field maps and the files that they will modify. Initial processing using the project curation template in the Gear produces a list of all possible files for each field map to modify. The regular expressions provide a more specific correspondence between the field map (matching the first regex) and the scans to modify (matching the second regex). This is described in more detail in part 6 of this tutorial.
Summary
BIDS Curation is an iterative process of editing names, running the BIDS Curation gear, and then checking the report spreadsheets. Part 3 of this tutorial gives a step-by-step description of running the BIDS Curation gear.
Once a session, subject or project has been curated, the data is ready to be formatted as per the BIDS Specification. That's right, after all of this effort, the data has not yet been written in BIDS format. This is because the curate-bids gear's job is only to set Flywheel metadata which will then be used to put data into the BIDS format. Files are kept in a database on the Flywheel platform and are only actually written out in BIDS format when data is exported using the CLI or as the first step of running a BIDS App gear.
For instance, when your run BIDS fMRIPrep, a virtual machine is spun up with the container that runs fMRIPrep and the gear job is launched. The gear, running at the session, subject, or project level, looks at all of the metadata appropriate for that level and copies files from the database into the proper BIDS directory structure using the proper BIDS file names. Next, the gear calls the fMRIPrep algorithm and provides it with the path to the BIDS data and all configuration parameters. When fMRIPrep completes, the gear packages the output and all results are saved to the database. An "analysis" appears on the Flywheel platform that has the results and the job log. This analysis is attached in the hierarchy at the level at which it was run (session, subject, or project).
Best Practice
Flywheel highly recommends using the ReproIn naming convention starting at the scanner console or for retrospective data by renaming acquisitions to match ReproIn. This makes BIDS curation easier because it makes the link between each acquisition and where it fits into the BIDS Specification explicit from the beginning. Once curated, you will know your data is ready for the next step and if it is not, you are in a better position to fix problems because they have been detected early. Putting your data into this commonly used convention makes the structure of your project clear to you and anyone familiar with the standard.
Alternative method: Edit the project curation template
If you do not want to rename your acquisition labels to match the ReproIn convention you can modify a BIDS project curation template so that it can recognize and process arbitrary file names. Typically, this is not recommended because the template is a large complicated JSON file and exactly what is happening during the processing accomplished by the “where” and “initialization” sections is best understood by stepping through it using a debugger. See an example of these sections in the BIDS template file article.
Crafting particular regular expressions in the template to recognize and extract the proper strings from arbitrary file names makes the template processing brittle whereas changing the names of acquisitions, subjects, and sessions to use the ReproIn convention that is expected by the existing template not only allows the template to work, but also makes the purpose of each acquisition clear on the platform.
Another method that will let you use BIDS App gears on Flywheel is to put the data into BIDS format outside the platform and then upload it using the CLI.
Next Steps
Take a look at our webinar on how the CMU-Pitt BRIDGE Center standardizes on BIDS using Flywheel. You will also learn more about how Flywheel's centralized platform streamlines BIDS in an end-to-end solution for data management and collaboration.
Once you have an understanding of BIDS in Flywheel, start planning your curation by using the BIDS Study Design Spreadsheet or take a look at our BIDS curation tutorial to start curating your own data.