Skip to content

Release Notes

0.1.9 [2025-12-09]

Enhancements:

  • Removed apikey_path config option; the gear now always uses the api-key input for authentication, simplifying the authentication flow

Fixes:

  • Fixed CVE-2025-47907 by installing jq and removing the yq gobinary from the container
  • Pinned urllib3 to 2.6 and requests to 2.32.4 to address security vulnerabilities

Maintenance:

  • Migrated base image from flywheel/python:3.12-debian to flywheel/python:3.12-wolfi-build
  • Switched system package management from apt-get to apk for Wolfi-based image compatibility
  • Removed pinned spacy 3.8.3 dependency constraint
  • Removed bot_key parameter from ReaderTaskCreator, parse_config, and run throughout the codebase
  • Removed deprecated local_dir_use_symlinks=False argument from snapshot_download call

Breaking Changes:

  • Removed apikey_path gear config option; gears previously configured to use a bot API key via apikey_path must be reconfigured to use the api-key input instead

0.1.8 [2025-08-12]

Fixes:

  • Changed exhaustive query parameter from true to false in /api/read_task_protocols API calls to correct protocol fetching behavior

Maintenance:

  • Pinned pillow to 11.3.0

0.1.7 [2025-08-07]

Enhancements:

  • Added workdir parameter to FwScanRedactEngine to allow configuring a working directory for temporary files, defaulting to a system temp directory when not provided.
  • Added work_path parameter to EasyOCR to support pre-downloaded model weights, with download_enabled=False to avoid runtime downloads.
  • Added tmpdir parameter to detect_and_unpack_zip to allow configuring where zip files are extracted.
  • Refactored create_reader_protocol to use server-side filtering and handle multiple existing protocols by selecting the most recently modified one.
  • Updated create_reader_task to accept a list of assignees and randomly assign tasks to one of them.
  • Added InstanceNumber assignment when separating multiframe DICOM arrays into individual files.
  • Updated create_single_annotation to use slice_number (as integer) in annotation["data"] and format updated_image_path using the slice number instead of the raw frame index string.
  • Added HPC support.
  • Use a filter when polling for Reader Task Protocols, rather than grabbing all existing protocols in the instance and filtering the response.

Fixes:

  • Fixed scan_dicoms_for_phi and redact_dicom_phi to fall back to self.workdir when no output_path is provided.
  • Fixed detect_multiframe to use self.workdir as the base for the separated_us_images directory instead of the current working directory.
  • Fixed parse_config to safely access job.id using .get("job", {}).get("id") to avoid AttributeError when the key is missing.
  • Fixed FWClient instantiation to use the unified timeout parameter instead of the removed read_timeout and connect_timeout parameters.
  • Fixed Handling of new Reader Task Protocol creation. Specifically:
  • Make sure gear grabs all protocols even if the user is Site Admin but doesn't have project permissions.
  • Use most recent :xprotocol, based on date of last modification.

Maintenance:

  • Upgraded base image from flywheel/python:3.10-debian to flywheel/python:3.12-debian with a multi-stage Dockerfile build (base, build, dev, and final production stages).
  • Updated PYTHON_VERSION from 3.10.16 to 3.12.10 in manifest.json.
  • Upgraded flywheel-sdk from 16.16.7 to ^20.0.
  • Upgraded pydicom from ^2.3.1 to ^3.0.
  • Upgraded fw-file from ^3.4.0 to ^4.1.
  • Upgraded fw-client from ^0.8.5 to ^2.1.
  • Upgraded torch from ^2.1.1 to ^2.7.
  • Upgraded urllib3 from >=1.25.4,<1.27 to ^2.0.
  • Upgraded pytest from ^6.1.2 to ^8.3 and moved it to tool.poetry.group.dev.dependencies.
  • Added requirements-dev.txt to Dockerfile and .dockerignore for the dev build stage.
  • Increased CI test coverage threshold from 0 to 45% and set PYVER to 3.12.
  • Added timeout: 3h and build-cluster-external tag to the test:gear CI job.
  • Updated .gitignore with standard Python, macOS, editor, and pre-commit entries.
  • Added new test modules test_easy_ocr.py, test_scan_and_redact.py, and test_utils_reader_tasks.py with comprehensive unit test coverage.

Documentation:

  • Updated README.md to clarify that reader tasks are assigned at random to one of the readers in the assignee list.
  • Updated manifest.json descriptions for Assignees and Baseline Operating Mode config fields with clearer wording and examples.
  • Updated CONTRIBUTING.md link text for Poetry configuration documentation.

0.1.6 [2025-04-09]

Fixes:

  • Fixed apply_common_bboxes being incorrectly called for multi-frame images, where it had already been applied earlier in the processing pipeline

0.1.5 [2025-03-27]

Fixes:

  • Fixed apply_common_bboxes to be called once after processing all files rather than per-file, correcting bounding box application for DICOM image redaction

0.1.4 [2025-03-17]

Enhancements:

  • Added apikey_path configuration option to support running the gear as a Gear Rule using an externally stored API key.
  • Added bot_key flag to ReaderTaskCreator to handle Flywheel bot API key authentication when running as a Gear Rule.
  • Moved obi/deid_roberta_i2b2 model files to be bundled within the package at fw_image_pii_detector/nlp_configs/obi_deid_roberta_i2b2, pinning to a specific revision for reproducibility.
  • Applied common bounding boxes per-image during processing loop rather than after all images are processed, improving annotation accuracy.
  • Added default_image_pii_detector_protocol as the default Reader Protocol name, replacing presidio_default_protocol.
  • Updated viewer configuration to restrict toolbar to Rectangle annotation tool only and disable segmentation panels.

Fixes:

  • Fixed validate_assignees to only be called when operating mode is Detection+ReaderTasks, avoiding unnecessary API calls in other modes.

Maintenance:

  • Renamed package from fw-presidio-image-redactor / fw_presidio_image_redactor to fw-image-pii-detector / fw_image_pii_detector throughout the codebase.
  • Upgraded base Docker image from python:3.9.19-slim-bookworm to flywheel/python:3.10-debian-build.
  • Updated python version constraint from ^3.9 to >=3.10, <3.13.
  • Pinned spacy to 3.8.3.
  • Added dependencies: botocore ^1.35.98, ssm-parameter-store ^19.11.0, poetry-plugin-export ^1.9.0, and urllib3 >=1.25.4,<1.27.
  • Updated gear name from presidio-image-redactor to image-pii-detector in manifest.json.

Documentation:

  • Updated README.md to reflect new gear name image-pii-detector throughout, including output file names, metadata tags, and operating mode descriptions.
  • Added documentation for the new apikey_path configuration option in README.md.
  • Removed outdated "active development / Release Candidate" warning from README.md.

Breaking Changes:

  • Gear tag applied to processed files changed from presidio-image-redactor to image-pii-detector; workflows filtering on the old tag will need to be updated.
  • Redacted file tag changed from presidio-redacted to image-pii-detector-redacted.
  • Default Reader Protocol name changed from presidio_default_protocol to default_image_pii_detector_protocol.

0.1.3 [2024-11-22]

Enhancements:

  • Added Detection+ReaderTasks operating mode, enabling creation of ReaderProtocol and ReaderTask annotations for human-in-the-loop PHI review workflows
  • Added RedactAllText operating mode that redacts all detected text regardless of PHI classification
  • Replaced Scanning Only boolean toggle with a Baseline Operating Mode string configuration option supporting four distinct modes: Detection Only, Detection+ReaderTasks, Dynamic PHI Redaction, and RedactAllText
  • Added Assignees configuration option for specifying Flywheel user emails to assign ReaderTask reviews
  • Added PHI-Not-Found tag output to label input files where no PHI was detected
  • Added presidio-image-redactor gear tag applied to all processed input files
  • Installed en_core_web_lg-3.7.1 spaCy model directly in Dockerfile for improved NER capabilities

Documentation:

  • Rewrote README.md with updated table of contents, expanded operating mode descriptions, three new use case workflows, and revised configuration settings reflecting the new Baseline Operating Mode options
  • Updated workflow Mermaid diagram to reflect four operating modes and downstream gear integrations

Maintenance:

  • Migrated CI pipeline from sse-qa-ci to flywheel-io/tools/etc/qa-ci with updated ci/gear.yml template and large runner override
  • Updated PYVER variable removed; PUBLISH_POETRY changed from "false" to empty string in .gitlab-ci.yml
  • Updated qa-ci pre-commit hooks reference to 3218fd46 and added hadolint, jsonlint, linkcheck hooks with configured ignore rules

Breaking Changes:

  • Removed Toggle Scanning Only boolean config option; replaced by Baseline Operating Mode string option — existing gear configurations using the boolean toggle must be updated
  • Removed Complete Redaction boolean config option; functionality now accessed via RedactAllText operating mode selection

0.1.2 [2024-10-08]

Enhancements:

  • Added Transformer Score Threshold config option (0–100) to set the minimum confidence score for transformer-identified PHI entities
  • Added Entity Frequency Threshold config option (0–100) for multi-frame DICOM files, specifying the minimum percentage of frames an entity must appear across to be included
  • Added Complete Redaction config option to redact all burned-in text regardless of whether it is PHI
  • Added Entities to Find config option as a comma-separated string listing which entity types the gear should detect
  • Added Use DICOM Metadata config option to create a regex recognizer from DICOM metadata to improve PHI detection in pixel data
  • Added bundled HuggingFace model (obi/deid_roberta_i2b2) download step in Dockerfile to pre-cache the transformer model at image build time
  • Added example PHI info CSV output document under docs/Example Documents/
  • Added preprocessing Jupyter notebook under docs/notebooks/

Fixes:

  • Fixed typo "Mircrosoft" → "Microsoft" in README.md
  • Fixed typo "pre-exisiting" → "pre-existing" and "idetification" → "identification" in README.md
  • Fixed typo "faciliate" → "facilitate" in README.md
  • Fixed typo "obscurred" → "obscured" in README.md

Maintenance:

  • Upgraded base image from python:3.8-slim to python:3.9.19-slim-bookworm (pinned by digest)
  • Added cv2 system dependencies (ffmpeg, libsm6, libxext6) to Dockerfile
  • Added apt-get autoclean and apt-get autoremove steps to Dockerfile to reduce image size
  • Added .gitignore for local development artifacts (input/, presidio-image-redactor*, config.json, run.sh)
  • Updated .dockerignore to include .vscode directory
  • Migrated linting from black/isort to ruff and ruff_format in pre-commit hooks; added gearcheck hook
  • Added PYVER, DEBUG, PUBLISH_POETRY, and CACHE_CLEAR CI variables to .gitlab-ci.yml
  • Switched CI reference from default.yml to large-default.yml
  • Removed merge request template default.md

Documentation:

  • Rewrote README.md config settings section, replacing individual entity toggle flags with consolidated Entities to Find, Transformer Score Threshold, Entity Frequency Threshold, Use DICOM Metadata, and Complete Redaction entries
  • Updated README.md scanning and redaction mode descriptions to reflect Toggle Scanning Only config option
  • Updated README.md input file classifications and modalities (US, CT, MR, XRay) for the image_file input
  • Updated README.md workflow diagram and step descriptions
  • Updated README.md output file naming conventions and descriptions
  • Added recommendation in README.md to run dicom-fixer prior to the gear

0.1.1 [2023-09-20]

Fixes:

  • Fixed typo in gear description ("opensource" → "open source")
  • Pinned presidio-image-redactor to a specific upstream branch to resolve a greyscale bug in DICOM image processing

0.1.0 [2023-09-08]

Enhancements:

  • Initial release of the Presidio Image Redactor gear, which scans DICOM images for PII using Microsoft's presidio SDK with OCR, NER, and regex detection.
  • Added PII detection-only mode that tags acquisitions and files with PHI-Found when PII is identified.
  • Added PII redaction mode that masks identified PII in DICOM pixel data.
  • Added support for bounding box image generation to visualize detected PII regions.
  • Added support for optional prior scan inputs (bbox_coords) to reuse bounding box coordinates from a previous run.
  • Added configurable PII entity detection for PERSON, DATE_TIME, LOCATION, PHONE_NUMBER, MEDICAL_LICENSE, URL, NRP, EMAIL_ADDRESS, CRYPTO, IBAN_CODE, IP_ADDRESS, CREDIT_CARD, US_SSN, US_DRIVER_LICENSE, US_PASSPORT, US_BANK_NUMBER, and US_ITIN.
  • Added DICOM metadata integration to improve PII detection accuracy via the use_dicom_metadata config option.
  • Added support for both single DICOM files and zipped DICOM series as input.

Maintenance:

  • Added Dockerfile using python:3.8-slim with tesseract-ocr for OCR support.
  • Added CI configuration via .gitlab-ci.yml and pre-commit hooks including black, isort, pytest, markdownlint, yamllint, and manifest validation.
  • Added pyproject.toml with poetry for dependency management, including presidio-image-redactor, presidio-analyzer, flywheel-gear-toolkit, fw-file, pydicom, pandas, and opencv-python.

Documentation:

  • Added README.md with gear overview, inputs, config settings, outputs, and workflow documentation.
  • Added CONTRIBUTING.md with setup, dependency management, and release instructions.
  • Added FAQ.md as a placeholder for frequently asked questions.