A cluster is a group of interconnected computers that work together to perform computationally intensive tasks. Flywheel sites using V3 infrastructure can be deployed with an ingest cluster for uploading their data from an Amazon S3, Google bucket, or Azure blob into Flywheel. Running your ingest jobs on your Flywheel cluster means your data uploads faster and more reliably than if you use your own computer.
This article gives an overview of the considerations for using a cluster as well as how to find the URL for your cluster.
-
The site must use V3 infrastructure. To check your site, click on the help menu and look for (V3) after the version. Example: Version 16.4.4 (V3)
-
The ingest cluster is not enabled by default and must be configured by Flywheel support. Contact support to see if the ingest cluster is configured on your site.
-
Flywheel cluster supports moving data from an Amazon S3, Google Bucket, or Azure Blobs into Flywheel. The cluster must be able to access these storage buckets.
-
To leverage the cluster, you must use the Flywheel command-line interface (CLI) to upload data. Learn more about how to install the CLI.
-
When using the cluster for ingest project commands, you may incur data ingress and egress fees because you are moving data in or out of cloud storage. If you are unsure if these fees are applicable to your Flywheel configuration, contact Flywheel support.
You must configure the Flywheel ingest service to have access to your data bucket before beginning. You can contact Flywheel support to get the service account name that is associated with the ingest feature and will be used to reach our to your data bucket.
AWS
Here is the Bucket policy, Flywheel support will provide the AWS IAM Role.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "allow-uat-to-read-prod-s3", "Effect": "Allow", "Principal": { "AWS": [ "arn:aws:iam::1234:role/FLYWHEEL-xxxxxx-ingest-role" ] }, "Action": [ "s3:ListBucket", "s3:GetObjectAttributes", "s3:GetObject", "s3:GetBucketLocation" ], "Resource": [ "arn:aws:s3:::your-s3-data-bucket-xxxx/*", "arn:aws:s3:::yours3-data-bucket-xxxx" ] } ] }
If your bucket is encrypted with a KMS key you will also need to grant the AWS Ingest IAM Role access to decrypt too.
{ "KeyId": "arn:aws:kms:us-west-2:12345678:key/123456677-1234-1234-1234-a47a62f0da7b", "Name": "FLYWHEEL-ingest-decrypt-access-1", "GranteePrincipal": "arn:aws:iam::12345678:role/FLYWHEEL-xxxxxx-ingest-role", "Operations": [ "Decrypt" ] }
GCP
You will need to assign the Flywheel Ingest service Storage Object Viewer access on your data bucket.
Azure
Reach out to Flywheel support to help integrate with Azure Blob Storage.
Use the --cluster
optional flag along with the cluster URL in your CLI command. Then for the SRC, add the URL for the bucket storing your data. The --follow
flag is also helpful for tracking your import.
Your Flywheel cluster URL is the same URL you use to sign in to Flywheel followed by /ingest. For example, if you sign in to Flywheel at https://universityABC.flywheel.io
, the cluster URL is https://universityABC.flywheel.io/ingest
.
Below is an example of importing DICOMs from an AWS S3 bucket:
fw ingest dicom --cluster https://universityABC.flywheel.io/ingest --follow S3://bucket-name/key-name lab612 "My Study"
Not every CLI command includes the cluster feature. Below are the commands where you can leverage a cluster for faster data import.