Skip to content

Use the Flywheel Cluster to Ingest Data Faster

Introduction

A cluster is a group of interconnected computers that work together to perform computationally intensive tasks.

Flywheel sites using V3 infrastructure can be deployed with an ingest cluster for uploading their data from an Amazon S3, Google bucket, or Azure blob into Flywheel.

Running ingest jobs on your Flywheel cluster means your data uploads faster and more reliably than if you use your own computer.

This article gives an overview of the considerations for using a cluster as well as how to find the URL for your cluster.

Instruction Steps

Considerations

  • The site must use V3 infrastructure. To check your site, click on the help menu and look for (V3) after the version.  Example: Version 16.4.4 (V3)
  • The ingest cluster is not enabled by default and must be configured by Flywheel support. Check with your site admin or contact support to see if the ingest cluster is configured on your site.
  • Flywheel cluster supports moving data from an Amazon S3, Google Bucket, or Azure Blobs into Flywheel. The cluster must be able to access these storage buckets.
  • To leverage the cluster, you must use the Flywheel command-line interface (CLI) to upload data. Learn more about how to install the CLI.
  • When using the cluster for ingest project commands, you may incur data ingress and egress fees because you are moving data in or out of cloud storage.

    If you are unsure if these fees are applicable to your Flywheel configuration, contact support.

Configure Credentials for Access

The Flywheel ingest service must be configured to have access to your data bucket before beginning. Contact Flywheel support to get the service account name that is associated with the ingest feature and will be used to reach out to your data bucket.  

AWS

AWS Bucket policy

Flywheel support will provide the AWS IAM Role.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "allow-uat-to-read-prod-s3",
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::1234:role/FLYWHEEL-xxxxxx-ingest-role"
        ]
      },
      "Action": [
        "s3:ListBucket",
        "s3:GetObjectAttributes",
        "s3:GetObject",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::your-s3-data-bucket-xxxx/*",
        "arn:aws:s3:::yours3-data-bucket-xxxx"
      ]
    }
  ]
}

If your bucket is encrypted with a KMS key you will also need to grant the AWS Ingest IAM Role access to decrypt.

{
  "KeyId": "arn:aws:kms:us-west-2:12345678:key/123456677-1234-1234-1234-a47a62f0da7b",
  "Name": "FLYWHEEL-ingest-decrypt-access-1",
  "GranteePrincipal": "arn:aws:iam::12345678:role/FLYWHEEL-xxxxxx-ingest-role",
  "Operations": [
    "Decrypt"
  ]
}    

GCP

You will need to assign the Flywheel Ingest service Storage Object Viewer access on your data bucket.

Azure

Contact Flywheel support to integrate with Azure Blob Storage.

How to Use the Cluster when Importing Data

Use the optional --cluster flag with the cluster URL in your CLI command. Then for the SRC, add the URL for the bucket storing your data. The --follow flag is also helpful for tracking your import.

Your Flywheel cluster URL is the same URL you use to sign in to Flywheel, followed by /ingest. For example, if you sign in to Flywheel at https://universityABC.flywheel.io, the cluster URL is https://universityABC.flywheel.io/ingest.

Below is an example of importing DICOMs from an AWS S3 bucket:

fw ingest dicom --cluster https://universityABC.flywheel.io/ingest --follow S3://bucket-name/key-name lab612 "My Study"

Learn more about how to use the ingest dicom command.

Commands That Can Leverage a Cluster

Not every CLI command includes the cluster feature. Below are the commands that can leverage a cluster for faster data import.