How to Request Gear Compute Changes
Introduction
Flywheel will partner with you to configure the Gear compute to meet the individual needs of your site, in order to ensure stable, timely, and cost effective processing. We need to collaborate with you because success depends on tailoring the compute based on characteristics of the data, the files, the number and types of gears run, your compute requirements, and the level of custom gear development and usage. Because site usage and compute demands typically grow over time, it's important to stay in contact to meet your needs
Typical Symptoms of Compute Mismatch
Under-specified Compute
- Jobs failing without logs is a sign of compute OS crashes due to lack of RAM
- Sporadic failures without logs can occur if multiple jobs on the same VM combine to use more resources than available
- Gears failing due to storage constraints caused by large or many output files
- Gears spending over 30 min in pending state and then end up being automatically cancelled.
Over-specified Compute
- Higher than expected costs
- Jobs finishing very quickly (under 10 min) is a sign that most of the time is spent on obtaining and starting up a Virtual Machine (VM), vs usable compute.
Site Scenarios and Requests
Below are some common scenarios that may occur and how we might approach them.
Handling increased Demand for Compute
Site compute needs change, so if your site has not had a review of the compute in the past year or the reliability or performance of gear compute is not what it once was, it may be time for Flywheel to review your compute resources and make some suggestions to meet the new demand.
Flywheel gear compute allows for dynamic scaling of the number of compute Virtual Machines (VMs) depending on the size of the queue. The scaling is controlled by providing a maximum, which can be adjusted by Flywheel Support. This is one way to adapt to more demand of compute resources with the existing compute configurations.
Below are some examples of options that could be done to add compute configurations:
Modifying Static Compute
A static compute engine can be added, that will remain up and available for users running gears and for gear rules. This avoids the delay required to obtain cloud compute, but it incurs ongoing costs even when not used. This may be advantageous to provide users with the least delay for job runs, especially if there is continuous demand for these runs.
A static compute engine has a parameter that controls the maximum number of compute workers that will be available.
One option is to configure a gear tag to indicate to the job scheduler to run the gear on a specific engine. For example, the tag 'static'. Then users can use this tag on gear runs to direct the run to occur on the static engine. The user docs provide instructions for this.
Flywheel can provide a specific compute SKU from the site cloud vendor's catalog, as long as the site account has quota available. This allows you to request specific CPU, RAM, and storage for the engine. If you have a specific compute in mind, we can review that for feasibility. If you don't, that is fine too, and we can work with you on sizing.
Info
A typical static configuration has 2 CPUs and 8GB RAM, 200GB of storage. However, its possible to set up multiple static computes each configured with different VM specifications to handle certain gear runs, groups, or tags.
Compute for Custom Gear Development and Usage
Custom gear development and usage have different compute needs because gears in development and testing are often unique algorithms that have not been characterized in terms of their compute needs on the data at hand. In this case Flywheel recommends the creation of some specialized profiles for gear development and testing that give gear developers a choice of sending the gear job to one of a range of compute sizes. Flywheel support can set up 'small' , 'medium', and 'large' compute profiles that allow the gear developers to test the gear on realistic data on different compute sizes to help characterize the compute that would be needed for long term gear use. This is done by adding the 'small', 'medium', or 'large' gear tag to the gear run. This is also useful for troubleshooting other gears, that may be failing on a particular data input due to low resources. Flywheel Support can set up these compute profiles with gear tags on your site.
Modifying Dynamic Compute
The dynamic compute profiles can be modified or additional profiles can be added to handle different workloads. Dynamic compute can be modified to ensure higher throughput for processing gear rules that are triggered by incoming files. It is also useful to add a dynamic compute profile for users who require specialized compute to handle a particular gear or data set. In this case a gear tag is set up and that gear tag will ensure the job is routed to the specialized compute.
Contact Support
Site Admins should contact Flywheel support by submitting a ticket or emailing us at support@flywheel.io.