Release Notes

0.1.9 [2025-12-09]

Enhancements:

Removed apikey_path config option; the gear now always uses the api-key input for authentication, simplifying the authentication flow

Fixes:

Fixed CVE-2025-47907 by installing jq and removing the yq gobinary from the container
Pinned urllib3 to 2.6 and requests to 2.32.4 to address security vulnerabilities

Maintenance:

Migrated base image from flywheel/python:3.12-debian to flywheel/python:3.12-wolfi-build
Switched system package management from apt-get to apk for Wolfi-based image compatibility
Removed pinned spacy 3.8.3 dependency constraint
Removed bot_key parameter from ReaderTaskCreator, parse_config, and run throughout the codebase
Removed deprecated local_dir_use_symlinks=False argument from snapshot_download call

Breaking Changes:

Removed apikey_path gear config option; gears previously configured to use a bot API key via apikey_path must be reconfigured to use the api-key input instead

0.1.8 [2025-08-12]

Fixes:

Changed exhaustive query parameter from true to false in /api/read_task_protocols API calls to correct protocol fetching behavior

Maintenance:

Pinned pillow to 11.3.0

0.1.7 [2025-08-07]

Enhancements:

Added workdir parameter to FwScanRedactEngine to allow configuring a working directory for temporary files, defaulting to a system temp directory when not provided.
Added work_path parameter to EasyOCR to support pre-downloaded model weights, with download_enabled=False to avoid runtime downloads.
Added tmpdir parameter to detect_and_unpack_zip to allow configuring where zip files are extracted.
Refactored create_reader_protocol to use server-side filtering and handle multiple existing protocols by selecting the most recently modified one.
Updated create_reader_task to accept a list of assignees and randomly assign tasks to one of them.
Added InstanceNumber assignment when separating multiframe DICOM arrays into individual files.
Updated create_single_annotation to use slice_number (as integer) in annotation["data"] and format updated_image_path using the slice number instead of the raw frame index string.
Added HPC support.
Use a filter when polling for Reader Task Protocols, rather than grabbing all existing protocols in the instance and filtering the response.

Fixes:

Fixed scan_dicoms_for_phi and redact_dicom_phi to fall back to self.workdir when no output_path is provided.
Fixed detect_multiframe to use self.workdir as the base for the separated_us_images directory instead of the current working directory.
Fixed parse_config to safely access job.id using .get("job", {}).get("id") to avoid AttributeError when the key is missing.
Fixed FWClient instantiation to use the unified timeout parameter instead of the removed read_timeout and connect_timeout parameters.
Fixed Handling of new Reader Task Protocol creation. Specifically:
Make sure gear grabs all protocols even if the user is Site Admin but doesn't have project permissions.
Use most recent :xprotocol, based on date of last modification.

Maintenance:

Upgraded base image from flywheel/python:3.10-debian to flywheel/python:3.12-debian with a multi-stage Dockerfile build (base, build, dev, and final production stages).
Updated PYTHON_VERSION from 3.10.16 to 3.12.10 in manifest.json.
Upgraded flywheel-sdk from 16.16.7 to ^20.0.
Upgraded pydicom from ^2.3.1 to ^3.0.
Upgraded fw-file from ^3.4.0 to ^4.1.
Upgraded fw-client from ^0.8.5 to ^2.1.
Upgraded torch from ^2.1.1 to ^2.7.
Upgraded urllib3 from >=1.25.4,<1.27 to ^2.0.
Upgraded pytest from ^6.1.2 to ^8.3 and moved it to tool.poetry.group.dev.dependencies.
Added requirements-dev.txt to Dockerfile and .dockerignore for the dev build stage.
Increased CI test coverage threshold from 0 to 45% and set PYVER to 3.12.
Added timeout: 3h and build-cluster-external tag to the test:gear CI job.
Updated .gitignore with standard Python, macOS, editor, and pre-commit entries.
Added new test modules test_easy_ocr.py, test_scan_and_redact.py, and test_utils_reader_tasks.py with comprehensive unit test coverage.

Documentation:

Updated README.md to clarify that reader tasks are assigned at random to one of the readers in the assignee list.
Updated manifest.json descriptions for Assignees and Baseline Operating Mode config fields with clearer wording and examples.
Updated CONTRIBUTING.md link text for Poetry configuration documentation.

0.1.6 [2025-04-09]

Fixes:

Fixed apply_common_bboxes being incorrectly called for multi-frame images, where it had already been applied earlier in the processing pipeline

0.1.5 [2025-03-27]

Fixes:

Fixed apply_common_bboxes to be called once after processing all files rather than per-file, correcting bounding box application for DICOM image redaction

0.1.4 [2025-03-17]

Enhancements:

Added apikey_path configuration option to support running the gear as a Gear Rule using an externally stored API key.
Added bot_key flag to ReaderTaskCreator to handle Flywheel bot API key authentication when running as a Gear Rule.
Moved obi/deid_roberta_i2b2 model files to be bundled within the package at fw_image_pii_detector/nlp_configs/obi_deid_roberta_i2b2, pinning to a specific revision for reproducibility.
Applied common bounding boxes per-image during processing loop rather than after all images are processed, improving annotation accuracy.
Added default_image_pii_detector_protocol as the default Reader Protocol name, replacing presidio_default_protocol.
Updated viewer configuration to restrict toolbar to Rectangle annotation tool only and disable segmentation panels.

Fixes:

Fixed validate_assignees to only be called when operating mode is Detection+ReaderTasks, avoiding unnecessary API calls in other modes.

Maintenance:

Renamed package from fw-presidio-image-redactor / fw_presidio_image_redactor to fw-image-pii-detector / fw_image_pii_detector throughout the codebase.
Upgraded base Docker image from python:3.9.19-slim-bookworm to flywheel/python:3.10-debian-build.
Updated python version constraint from ^3.9 to >=3.10, <3.13.
Pinned spacy to 3.8.3.
Added dependencies: botocore ^1.35.98, ssm-parameter-store ^19.11.0, poetry-plugin-export ^1.9.0, and urllib3 >=1.25.4,<1.27.
Updated gear name from presidio-image-redactor to image-pii-detector in manifest.json.

Documentation:

Updated README.md to reflect new gear name image-pii-detector throughout, including output file names, metadata tags, and operating mode descriptions.
Added documentation for the new apikey_path configuration option in README.md.
Removed outdated "active development / Release Candidate" warning from README.md.

Breaking Changes:

Gear tag applied to processed files changed from presidio-image-redactor to image-pii-detector; workflows filtering on the old tag will need to be updated.
Redacted file tag changed from presidio-redacted to image-pii-detector-redacted.
Default Reader Protocol name changed from presidio_default_protocol to default_image_pii_detector_protocol.

0.1.3 [2024-11-22]

Enhancements:

Added Detection+ReaderTasks operating mode, enabling creation of ReaderProtocol and ReaderTask annotations for human-in-the-loop PHI review workflows
Added RedactAllText operating mode that redacts all detected text regardless of PHI classification
Replaced Scanning Only boolean toggle with a Baseline Operating Mode string configuration option supporting four distinct modes: Detection Only, Detection+ReaderTasks, Dynamic PHI Redaction, and RedactAllText
Added Assignees configuration option for specifying Flywheel user emails to assign ReaderTask reviews
Added PHI-Not-Found tag output to label input files where no PHI was detected
Added presidio-image-redactor gear tag applied to all processed input files
Installed en_core_web_lg-3.7.1 spaCy model directly in Dockerfile for improved NER capabilities

Documentation:

Rewrote README.md with updated table of contents, expanded operating mode descriptions, three new use case workflows, and revised configuration settings reflecting the new Baseline Operating Mode options
Updated workflow Mermaid diagram to reflect four operating modes and downstream gear integrations

Maintenance:

Migrated CI pipeline from sse-qa-ci to flywheel-io/tools/etc/qa-ci with updated ci/gear.yml template and large runner override
Updated PYVER variable removed; PUBLISH_POETRY changed from "false" to empty string in .gitlab-ci.yml
Updated qa-ci pre-commit hooks reference to 3218fd46 and added hadolint, jsonlint, linkcheck hooks with configured ignore rules

Breaking Changes:

Removed Toggle Scanning Only boolean config option; replaced by Baseline Operating Mode string option — existing gear configurations using the boolean toggle must be updated
Removed Complete Redaction boolean config option; functionality now accessed via RedactAllText operating mode selection

0.1.2 [2024-10-08]

Enhancements:

Added Transformer Score Threshold config option (0–100) to set the minimum confidence score for transformer-identified PHI entities
Added Entity Frequency Threshold config option (0–100) for multi-frame DICOM files, specifying the minimum percentage of frames an entity must appear across to be included
Added Complete Redaction config option to redact all burned-in text regardless of whether it is PHI
Added Entities to Find config option as a comma-separated string listing which entity types the gear should detect
Added Use DICOM Metadata config option to create a regex recognizer from DICOM metadata to improve PHI detection in pixel data
Added bundled HuggingFace model (obi/deid_roberta_i2b2) download step in Dockerfile to pre-cache the transformer model at image build time
Added example PHI info CSV output document under docs/Example Documents/
Added preprocessing Jupyter notebook under docs/notebooks/

Fixes:

Fixed typo "Mircrosoft" → "Microsoft" in README.md
Fixed typo "pre-exisiting" → "pre-existing" and "idetification" → "identification" in README.md
Fixed typo "faciliate" → "facilitate" in README.md
Fixed typo "obscurred" → "obscured" in README.md

Maintenance:

Upgraded base image from python:3.8-slim to python:3.9.19-slim-bookworm (pinned by digest)
Added cv2 system dependencies (ffmpeg, libsm6, libxext6) to Dockerfile
Added apt-get autoclean and apt-get autoremove steps to Dockerfile to reduce image size
Added .gitignore for local development artifacts (input/, presidio-image-redactor*, config.json, run.sh)
Updated .dockerignore to include .vscode directory
Migrated linting from black/isort to ruff and ruff_format in pre-commit hooks; added gearcheck hook
Added PYVER, DEBUG, PUBLISH_POETRY, and CACHE_CLEAR CI variables to .gitlab-ci.yml
Switched CI reference from default.yml to large-default.yml
Removed merge request template default.md

Documentation:

Rewrote README.md config settings section, replacing individual entity toggle flags with consolidated Entities to Find, Transformer Score Threshold, Entity Frequency Threshold, Use DICOM Metadata, and Complete Redaction entries
Updated README.md scanning and redaction mode descriptions to reflect Toggle Scanning Only config option
Updated README.md input file classifications and modalities (US, CT, MR, XRay) for the image_file input
Updated README.md workflow diagram and step descriptions
Updated README.md output file naming conventions and descriptions
Added recommendation in README.md to run dicom-fixer prior to the gear

0.1.1 [2023-09-20]

Fixes:

Fixed typo in gear description ("opensource" → "open source")
Pinned presidio-image-redactor to a specific upstream branch to resolve a greyscale bug in DICOM image processing

0.1.0 [2023-09-08]

Enhancements:

Initial release of the Presidio Image Redactor gear, which scans DICOM images for PII using Microsoft's presidio SDK with OCR, NER, and regex detection.
Added PII detection-only mode that tags acquisitions and files with PHI-Found when PII is identified.
Added PII redaction mode that masks identified PII in DICOM pixel data.
Added support for bounding box image generation to visualize detected PII regions.
Added support for optional prior scan inputs (bbox_coords) to reuse bounding box coordinates from a previous run.
Added configurable PII entity detection for PERSON, DATE_TIME, LOCATION, PHONE_NUMBER, MEDICAL_LICENSE, URL, NRP, EMAIL_ADDRESS, CRYPTO, IBAN_CODE, IP_ADDRESS, CREDIT_CARD, US_SSN, US_DRIVER_LICENSE, US_PASSPORT, US_BANK_NUMBER, and US_ITIN.
Added DICOM metadata integration to improve PII detection accuracy via the use_dicom_metadata config option.
Added support for both single DICOM files and zipped DICOM series as input.

Maintenance:

Added Dockerfile using python:3.8-slim with tesseract-ocr for OCR support.
Added CI configuration via .gitlab-ci.yml and pre-commit hooks including black, isort, pytest, markdownlint, yamllint, and manifest validation.
Added pyproject.toml with poetry for dependency management, including presidio-image-redactor, presidio-analyzer, flywheel-gear-toolkit, fw-file, pydicom, pandas, and opencv-python.

Documentation:

Added README.md with gear overview, inputs, config settings, outputs, and workflow documentation.
Added CONTRIBUTING.md with setup, dependency management, and release instructions.
Added FAQ.md as a placeholder for frequently asked questions.