Skip to content

Duplicate Handling with Bulk Imports

The Bulk Import system performs duplicate detection automatically and surfaces duplicate scenarios to a Data Manager for review and resolution before placing the affected data into Flywheel Core.

Detection of Duplication

The following data duplication scenarios are automatically detected when importing files in bulk.

File Path Collision

  • A file already exists with the same name at the intended destination path.
  • Multiple files with the same name are being imported into the same destination path either within the same import or across multiple simultaneous imports.

Container UID Duplication (Session or Acquisition)

When locating the intended destination container path for an incoming file by matching on a container UID (Session or Acquisition):

  • Multiple containers of the same type exist with the same UID in any location (pre-existing UID duplication), or
  • Only one container of the same type already exists with the same UID, but
    • It is located under a different parent container than the incoming data specifies (new UID duplication), or
    • It has a different label than the incoming data specifies (label mismatch)

Container Metadata Duplication (Subject, Session, or Acquisition)

When locating the intended destination container path for an incoming file by matching on a container label (Subject, Session, or Acquisition):

  • Multiple containers of the same type exist in the same location with the same UID (pre-existing label duplication), or
  • Only one container of the same type already exists in the same location with the same label, but
    • It has a different UID than the incoming data specifies (UID mismatch)

Configurable Scope for UID Duplication

When determining Container UID duplication, the search scope is configurable to be any one of the following three options:

  • Site: The same UID is used on more than container of the same type anywhere in the entire Flywheel site
  • Group: The same UID is used on more than container of the same type anywhere within the same Group
  • Project: The same UID is used on more than container of the same type anywhere within the same Project

The default choice is for the UID duplication scope to be set to Project.

To change the scope for UID duplication, contact Flywheel support.

Quarantining of Duplicates

By default, imports do not fail when duplication is detected. Instead,

  • Files or containers that do not cause duplication are imported normally even if other files within the same import job do cause duplication, and
  • Files or containers that do cause duplication are are quarantined within Flywheel but outside of the destination project, so that the pre-existing data is unaffected until a Data Manager is able to resolve the duplication.

Tools for Resolving Duplication

Tools are provided for data managers to resolve the duplication.

For each duplication occurrence, the system explains what caused the duplication, including the source and destination paths:

  • For File Path Collisions: Source path, destination path, and existing file name
  • For Container UID or Metadata duplication: Source file path, source UID, source UID, destination path, and for each conflicting file with matching UID (there may be multiple conflicting files):
    • Path to existing file in Flywheel hierarchy
    • UID of existing file

For each duplication occurrence, provide data managers options for resolving the duplication and clearing the occurrence from the quarantine area:

  • Update Existing: Update existing item with new information
  • Keep All: Keep both the existing and incoming data but without overwriting the existing data. (i.e., either allow duplication or rename the incoming data item to avoid conflict),
    • For File Path Collisions, automatically append a suffix to the incoming file name to avoid the collision
    • For Container Label Duplication, automatically append a suffix to the incoming container to avoid the duplication
    • For Container UID Duplication, allow the UID duplication to exist by creating the additional container with the same UID
  • Skip: Reject new item (i.e., ignore incoming item, leave existing item unaffected), or
  • Retry: Run the duplicate detection logic again for this item to determine if the duplication still exists or not (i.e., check again).