Filtering and Mapping Guide

This guide provides practical examples for using Flywheel's filtering and mapping patterns in Bulk Import and Export operations. Learn how to precisely control which files are processed and how they're organized in your Flywheel project.

Getting Started

Understanding Metadata

Filtering and mapping patterns use Flywheel metadata fields extensively. For a comprehensive overview of metadata types, structure, and usage in Flywheel, see Understanding Metadata in Flywheel.

Complex Scenarios: Use Rule Files

When command-line options become unwieldy or you need multiple sets of rules with first-match-wins logic, consider using Rule Files to define your import/export configuration in a reusable YAML format.

Understanding Filter Logic

Flywheel uses two types of filters:

Include filters (-i/--include): At least one must match for a file to be processed
Exclude filters (-e/--exclude): If any match, the file is skipped

If no include filters are specified, all files are included by default (unless excluded).

Basic Syntax

Each filter follows this pattern:

1	`field_name operator value`

Examples:

# Include only DICOM files by extension
--include 'ext=dcm'

# Exclude temporary files
--exclude 'name=~*temp*'

# Files larger than 10MB
--include 'size>10MB'

Common Use Cases

File Type Filtering

Include Specific File Types

# DICOM files only
--include 'path=~*.dcm'

# Multiple file types
--include 'path=~*.dcm' --include 'path=~*.nii'

# By file extension
--include 'ext=dcm'

Exclude File Types

# Exclude system files
--exclude 'name=~.*' --exclude 'name=~*~'

# Exclude common temp files
--exclude 'name=~*.tmp' --exclude 'name=~*.temp'

# Exclude by pattern
--exclude 'path=~*backup*'

Directory Structure Filtering

By Directory Depth

# Files at exactly 4 levels deep (project/subject/session/acquisition/file)
--include 'depth=4'

# Files 3 or 4 levels deep
--include 'depth>=3' --include 'depth<=4'

By Directory Pattern

# Only from patient directories
--include 'path=~patient*/*'

# Exclude test directories
--exclude 'path=~*test*'

# Specific study pattern
--include 'path=~study_*/subject_*/session_*/*'

Size-Based Filtering

File Size Ranges

# Large files only
--include 'size>100MB'

# Medium-sized files
--include 'size>=1MB' --exclude 'size>1GB'

# Exclude empty files
--exclude 'size=0'

Using Human-Readable Sizes

# Various size units
--include 'size>1.5GB'
--include 'size<=500KB'
--include 'size=50MB'

DICOM-Specific Filtering (Export Only)

Export Operations Only

DICOM metadata filtering is only available for exports, not imports. During imports, filters can only use file system information (path, name, size, etc.).

To use DICOM metadata in export filters, you must first extract the DICOM headers into Flywheel metadata by running the File Metadata Importer gear.

By Modality

# MRI only
--include 'file.info.header.dicom.Modality=MR'

# Multiple modalities
--include 'file.info.header.dicom.Modality=MR' --include 'file.info.header.dicom.Modality=CT'

# Exclude specific modality
--exclude 'file.info.header.dicom.Modality=PR'  # Exclude presentation state

By Series Description

# Include only T1 sequences
--include 'file.info.header.dicom.SeriesDescription=~*T1*'

# Exclude localizers and scouts
--exclude 'file.info.header.dicom.SeriesDescription=~*localizer*'
--exclude 'file.info.header.dicom.SeriesDescription=~*scout*'

By Patient Information

# Specific patient
--include 'file.info.header.dicom.PatientID=PAT001'

# Research subjects only
--include 'file.info.header.dicom.PatientID=~RESEARCH*'

# Exclude test patients
--exclude 'file.info.header.dicom.PatientID=~TEST*'

Mapping Patterns

Mapping patterns define how source paths are transformed into Flywheel's hierarchy structure.

Basic Mapping Syntax

1	`--mapping 'source_pattern=flywheel_pattern'`

Standard Variables

{subject} or {sub} - Subject label
{session} or {ses} - Session label
{acquisition} or {acq} - Acquisition label
{file} - File name (shorthand for {file.name})
DICOM tags like {PatientID}, {StudyDate}, {SeriesDescription}, etc.

Common Mapping Examples

Simple Directory Mapping

# Map: patient01/study_20231201/series_t1/file.dcm
# To: project/patient01/study_20231201/series_t1/file.dcm
--mapping 'path={sub}/{ses}/{acq}/*'

Skip Directory Levels

# Skip initial directory level
# Map: data/patient01/scan/t1_weighted/file.dcm
--mapping 'path=data/{sub}/{ses}/{acq}/*'

# Skip multiple levels at start
# Map: project/data/raw/patient01/study01/series01/file.dcm
--mapping 'path=project/data/raw/{sub}/{ses}/{acq}/*'

# Skip intermediate directories with wildcards
# Map: patient01/2024/scan/t1/file.dcm
--mapping 'path={sub}/*/{ses}/{acq}/*'

# Skip multiple intermediate directories
# Map: patient01/data/raw/QC/study01/scan/series01/file.dcm
--mapping 'path={sub}/*/*/{ses}/*/{acq}/*'

Extract from File Names

# Map files like: PATIENT01_SESSION01_T1_001.dcm
--mapping 'name={sub}_{ses}_{acq}_*'

Container Label Derivation

This is the most common mapping use case - deriving Flywheel container labels from either file paths or DICOM metadata.

From File Path Structure

# Derive labels directly from directory structure
--mapping 'path={sub}/{ses}/{acq}/*'

# With prefixes in path
--mapping 'path=data/{sub}/{ses}/{acq}/*'

From DICOM Metadata

# Basic DICOM tag mapping
--mapping 'subject.label={PatientID}'
--mapping 'session.label={StudyDescription}'
--mapping 'acquisition.label={SeriesDescription}'

# Composite labels from multiple DICOM tags
--mapping 'subject.label={PatientID}'
--mapping 'session.label={StudyDescription}_{StudyDate}'
--mapping 'acquisition.label={Modality}_{SeriesNumber}_{SeriesDescription}'

Advanced Mapping Patterns

Date-Based Organization

# Use DICOM StudyDate for session label
--mapping 'subject.label={PatientID}'
--mapping 'session.label={StudyDate}'
--mapping 'acquisition.label={SeriesDescription}'

# Use DICOM AcquisitionDate or SeriesDate
--mapping 'session.label={AcquisitionDate}'
# or
--mapping 'session.label={SeriesDate}'

# Use file system creation time
--mapping 'session.label={ctime}'

# Use file system modification time
--mapping 'session.label={mtime}'

# Combine date with other metadata
--mapping 'session.label={StudyDescription}_{StudyDate}'
--mapping 'acquisition.label={SeriesDescription}_{AcquisitionDate}'

DICOM File Naming

# Set ZIP archive name when grouping DICOM files (requires --type dicom)
--dicom-instance-name '{SeriesDescription}_{SeriesNumber}.dicom.zip'

# Rename individual files during import
--mapping 'file.name={PatientID}_{SeriesNumber}_{file.name}'

Note: The --dicom-instance-name option (available in version 20.5+) controls the naming of ZIP archives created when grouping DICOM files during import.

Multiple Path Components

# Complex source structure
--mapping 'path=study/{sub}/visit_{ses}/scan_{acq}/series_*/file_*'

Real-World Examples

Example 1: Custom DICOM Import with Date-Based Labels

Scenario: Import DICOM data with custom labels incorporating dates and custom ZIP file naming

Source structure:

/data/research/
├── SUBJ001/
│   ├── visit1/
│   │   ├── t1_mprage/
│   │   └── fmri_rest/
│   └── visit2/
└── SUBJ002/

Import command:

fw-beta import run \
    --project "fw://mygroup/Research Study" \
    --storage /data/research \
    --type dicom \
    --mapping 'session.label={StudyDate}_{StudyDescription}' \
    --mapping 'acquisition.label={SeriesNumber}_{Modality}_{SeriesDescription}' \
    --dicom-instance-name '{SeriesNumber}_{SeriesDescription}.dicom.zip'

Custom Mapping with Dates

This example demonstrates:

Session labels: Combine study date with description (e.g., 20240315_Brain_Protocol)
Acquisition labels: Include series number, modality, and description (e.g., 001_MR_T1_MPRAGE)
ZIP file naming: Custom format using series metadata instead of default naming
Important: Subject labels use the default {PatientID} mapping. Custom mappings override defaults, so --type dicom re-enables DICOM-specific behaviors (grouping, zipping) before applying these custom rules

Scenario: Export T1 and fMRI data for analysis

Export command:

fw-beta export run \
    --project "fw://mygroup/Analysis Project" \
    --storage /analysis/exported_data \
    --include 'file.info.header.dicom.Modality=MR' \
    --include 'file.info.header.dicom.SeriesDescription=~*T1*' \
    --include 'file.info.header.dicom.SeriesDescription=~*fMRI*' \
    --exclude 'file.info.header.dicom.SeriesDescription=~*localizer*' \
    --path '{subject.label}/{session.label}/{acquisition.label}/{file.name}'

DICOM Metadata Extraction Required

This export example requires that DICOM headers have been extracted into Flywheel metadata by running the File Metadata Importer gear before the export.

Custom Export Path

The --path value used here starts with {subject.label}/, which avoids creating a project-level folder at the top of the export destination. The default path is {project.label}/{subject.label}/{session.label}/{acquisition.label}/{file.name}, which includes the project folder.

Example 3: Quality Control Filter

Scenario: Import only high-quality, complete scans

Import command:

fw-beta import run \
    --project "fw://mygroup/QC Dataset" \
    --storage my-dicom-storage \
    --type dicom \
    --include 'path=~*.dcm' \
    --include 'size>1MB' \
    --exclude 'name=~*localizer*' \
    --exclude 'name=~*scout*' \
    --exclude 'name=~*SECONDARY*'

Import Filtering Limitations

This import example filters by filename patterns (e.g., excluding files with "localizer" or "scout" in the name). Import filters cannot use DICOM metadata tags. To filter by actual DICOM SeriesDescription values, you would need to import all files first, then use an export operation with DICOM metadata filters.

Testing Your Patterns

Before running full imports or exports, test your patterns using the test or dry-run modes.

Import Test Command

For imports, the import test command shows exactly how patterns will be applied and what metadata will be extracted from a specific file:

# Test a single file
fw-beta import test /path/to/sample/file.dcm \
    --type dicom \
    --include 'path=~*.dcm' \
    --exclude 'name=~*temp*' \
    --mapping 'subject.label={PatientID}' \
    --mapping 'session.label={StudyDate}'

This shows:

Whether the file matches your filters
What metadata would be extracted
How the file would be organized in Flywheel

Dry-Run Mode

For both imports and exports, the --dry-run option performs a simulated operation without making any actual changes:

# Dry-run import
fw-beta import run \
    --project "fw://mygroup/project" \
    --storage storage-id \
    --type dicom \
    --include 'path=~*.dcm' \
    --dry-run

# Dry-run export
fw-beta export run \
    --project "fw://mygroup/project" \
    --storage /export/path \
    --include 'file.info.header.dicom.Modality=MR' \
    --dry-run

Dry-run mode:

Processes only a small subset of the data (not the full dataset)
Shows a preview of how patterns will be applied
Presents a summary report of how data would be processed
Makes no actual changes to files or Flywheel

Troubleshooting Common Issues

Pattern Not Matching

Check case sensitivity: String matching is case-insensitive, but verify your patterns
Test incrementally: Start with simple patterns and add complexity
Use the test command: Test patterns on sample files first

Incorrect Mapping

Verify source structure: Ensure your mapping pattern matches the actual file paths
Check for missing components: All required fields (sub, ses, etc.) must have values
Test with representative files: Use files from different parts of your dataset

Performance Issues

Be specific: More specific patterns reduce processing time
Use exclude filters: Exclude unwanted files early in the process
Optimize depth filtering: Use depth filters to avoid scanning unnecessary directories