Skip to content

DICOM Connector Metadata Extraction Guide

Introduction

This document provides technical specifications for how the DICOM Connector processes DICOM data and extracts metadata fields.

For information about how DICOM maps to the Flywheel hierarchy, see the Hierarchy Mapping Guide. For advanced configuration options, see the Advanced Configuration Guide.

Data Processing Workflow

The DICOM Connector follows this processing sequence:

  1. Discovery: Queries PACS or scanner at regular intervals for new data items
  2. Readiness Assessment: Determines if series are complete and ready for upload
  3. Anonymization: Remove or obfuscates sensitive metadata following rules configured via de-identification profiles
  4. Metadata Extraction: Parses DICOM headers for organizational and descriptive information
  5. Routing Resolution: Uses routing strings to determine target Flywheel containers
  6. Upload Process: Transfers complete series to appropriate locations
  7. Monitoring: Continues monitoring for changes to uploaded items

After upload completes, data may be further processed by Gears based on configured Gear Rules, such as automated classification via the Classifier Gear

Workflow Diagram:

flowchart TD
    start([Start]) --> discovery[Discovery: Query for new data]
    discovery --> readiness{Readiness: Series complete?}
    readiness -->|No| discovery
    readiness -->|Yes| anon[Anonymization: Apply de-ID profile]
    anon --> metadata[Metadata Extraction: Parse DICOM headers]
    metadata --> routing[Routing Resolution: Determine target containers]
    routing --> upload[Upload Process: Transfer to Flywheel]
    upload --> monitoring[Monitoring: Watch for changes]
    monitoring -.-> discovery
    upload -.->|Optional| gears[Gear Rules Processing]

    style start fill:#e8f5e9
    style discovery fill:#fff
    style readiness fill:#fff9c4
    style anon fill:#fff
    style metadata fill:#fff
    style routing fill:#fff
    style upload fill:#fff
    style monitoring fill:#fff
    style gears fill:#f3e5f5

The workflow shows the continuous monitoring loop and the optional Gear Rules processing that occurs after upload.

Series Completion Logic

The Connector uses the following logic to determine when to upload data:

  1. Change Detection: Has the number of images in the series (NumberOfSeriesRelatedInstances header) changed since last check?
  2. If yes: Assumes scan still in progress, waits for next check cycle
  3. If no: Proceeds to next check

  4. Upload Status Check: Has this item already been uploaded?

  5. If yes: Continues monitoring for changes
  6. If no: Proceeds to upload

  7. Upload Execution: Uploads the complete series to Flywheel

  8. Continued Monitoring: Monitors for state changes that might trigger re-upload

The definition for "item", "state", and "change" varies for different data types or modalities.

PatientName Parsing

The PatientName field is parsed to extract subject.firstname and subject.lastname metadata (not used for subject.label):

  1. Caret Delimiter: If ^ character is present, split as {lastname}^{firstname}
  2. Space Delimiter: If no ^, split on space as {firstname} {lastname}
  3. No Delimiter: Entire string goes to subject.lastname, subject.firstname remains empty
  4. Capitalization: Both fields are automatically capitalized

PatientName Examples

Input PatientName Parsed subject.firstname Parsed subject.lastname
Doe^John John Doe
Doe^John^Middle John^Middle Doe
John Doe John Doe
John Middle Doe John Middle Doe
john doe John Doe
JohnDoe (empty) JohnDoe

Timestamp Parsing

Timestamp fields combine date and time DICOM tags:

  • session.timestamp: Parsed from StudyDate + StudyTime
  • acquisition.timestamp: Parsed from AcquisitionDate + AcquisitionTime

Siemens-Specific Behavior

For Siemens scanners (Manufacturer = Siemens):

  • acquisition.timestamp uses SeriesDate + SeriesTime instead of AcquisitionDate + AcquisitionTime
  • If SeriesDate/SeriesTime cannot be parsed, defaults to session.timestamp

Timezone Configuration

Timestamps use the timezone specified during connector configuration (defaults to UTC). This affects all session and acquisition timestamps.

For timezone configuration options and multi-site considerations, see the Advanced Configuration Guide.