Logo Logo

Gear Building Tutorial Part 2d: The Dockerfile

Introduction

The Dockerfile

The Dockerfile creates a self-contained environment. The documentation for Dockerfiles will be helpful as you develop more advanced gears, but it's important to keep in mind that the Flywheel engine behaves slightly differently, which will be discussed as we go along. Here, we'll provide an abbreviated "Docker for Flywheel" instructions, which will get you on your feet with Docker in the context of a flywheel gear.

This section is designed only to familiarize you with the components of dockerhub, docker files, and how to use them in a flywheel gear. It will NOT help you build a functioning dockerfile. If you are unfamiliar with docker, we recommend you read this section, AND the following section in their entirety.

Instruction Steps

Dockerfile Anatomy

Creating the Dockerfile will by nature include some (or all) of the following steps:

Select the OS
Install any additional packages necessary
Set environment variables
Copy in any necessary files/folders for the program to run
Set an entrypoint

When a Dockerfile is built (i.e. the commands in the Dockerfile are run to create a self-contained environment), it generates an image. This image can be run with Docker, almost like a virtual machine, with the self-contained environment you made. Any image can be uploaded to Dockerhub and stored. This image can then be used in any other Dockerfile as a starting point (instead of starting with an OS).

Example Use Case

Let's assume that your lab uses a specific version of centOS, along with specific versions of FSL, FreeSurfer, Matlab, and Python. You, the researcher, may wish to create one "base" Dockerfile that starts with a CentOS image, and installs FSL, FreeSurfer, Matlab, and Python. If you've ever done this before, you know that it can take a very long time. Fortunately, you'll only need to do it once. After you build the image, you can upload it to your docker hub. In the future, if you ever need to add an additional piece of software, you don't need to start from scratch! You can pull your image straight from docker, and add to it directly.

Now, let's walk through each of the steps listed above, and build our Dockerfile.

Select the OS

Don't worry, you don't have to install a fresh OS yourself. Docker and the Docker community have already generated docker images you can start from. Docker images are stored at Dockerhub. That link will take you to the official Docker image releases. Scrolling through, you may see some familiar operating systems like Ubuntu and CentOS. To determine which OS to use, consider the following:

Is any of the software you need to install incompatible with any operating systems?
Is there a legacy you wish to follow (i.e. previously, all analysis were done in CentOS 6.0)?
How can I make the image as lightweight as possible (different OS's have different sizes)

For our example, we're only running a simple bash script. For that, we could choose Ubuntu, however that OS comes with a lot of additional software that we simply don't need, and the images range between 20-40 Mb. Let's look for something more lightweight.

Here's the page for the Docker image alpine. You can search for it by typing alpine in the search bar up top:

We'll cover the basic pieces of information on this page:

Red: A brief description of the docker image. At the bottom of the red section, you see the various keywords associated with this image. These include things like the processors it's compatible with (x86-64), information about the image itself (Linux, Operating System), and maybe some description about the release itself (Official image). Clicking any of these brings up a search for all other images with the same keyword.
Green: Tags in Docker can really be any annotation the author wishes to assign to his docker image. Tags are referenced after the docker image name following a colon. For example, in the alpine image shown above, referencing the image alpine with the tag 3.9.4 is done as follows: alpine:3.9.4.

Tags can also be used to indicate the release version of your software. For example, if your image is designed to run some specific code you're developing, you may create an image with the most recent version (v1.0), and then later update your code and recompile a new image. Rather than overwrite the old version, you may simply increment the version (v1.1) as a new tag. Typically, any operating system images (such as this one) use the tag to correspond directly to the version of the operating system. In the alpine reference above, the tag 3.9.4 refers to alpine version 3.9.4. This can help you quickly identify what software versions are available for you to use for this particular docker image.

Blue: Quick Reference. This section is optional, and if the developers are doing a good job, they'll provide links to wikis, tutorials, and documentation on this release.

Between the red and the green sections, you'll notice some tabs. Since a great deal of time was spent covering tags, let's discuss the tab labeled tag and examine the information on that page.

In the tags section, we now see a detailed description of the tags we saw on the description tab. Here we can see the actual file size of each tag, circled in red. This is typically where you go to find the size of a given docker image, and the rest of the information won't really be relevant for basic gear development.

Downloading an Image

The Docker image sets the OS and environment to run the gear on. In our case, we only need basic bash functionality and Python, and so we choose an extremely lightweight image. For our purposes, we will simply take the alpine:latest image. To download the image, use the following steps:

Ensure docker is installed and running
Open a terminal window
Enter the following command:

docker pull alpine:latest

You should see a message saying that docker is pulling the image. For example:

latest: Pulling from library/alpine  
89d9c30c1d48: Pull complete   
Digest: sha256:c19173c5ada610a5989151111163d28a67368362762534d8a8121ce95cf2bd5a  
Status: Downloaded newer image for alpine:latest  
docker.io/library/alpine:latest

Docker will do this step automatically if you give it the correct instructions when you go to build your file, but sometimes it's helpful to download images (especially large ones) ahead of time, so you don't have to wait later on.

Viewing your Docker Image

You can view your docker images on your machine by typing the following:

docker image ls

This will give you a list of your docker images, along with their associated size. Some images can become very large (multiple Gb), and so it's recommended to occasionally prune your image list if you run into space issues on your machine. Images can be manually removed by using the friendly name (alpine:3.9.4), or the IMAGE ID of the old image you wish to remove (The IMAGE ID is a 12 digit alphanumeric code found by calling docker image ls).

The following commands will remove the desired image:

docker image rm <IMAGE\_NAME>

or

docker image rm <IMAGE\_ID>

For now, let's pull our alpine:latest image and continue our gear development.

What to Place in Your Dockerfile

At this point, we have a run file, a manifest.json, and a Docker image. Our run.py file is python, so we'll need to modify our Dockerfile to install python into the image for us. There are a few main commands we can use in Dockerfiles to set up our environment:

RUN: this executes a command as if in a regular bash shell. Typically, in docker, the only run commands you’re doing involve downloading, installing, configuring software, and setting up your directory structure. Anything you use this software for is done later: once you’ve created the image, you can open it as a container and run your code in it then.
ENV: This sets an environment variable in the resulting Docker image
COPY: This will copy files from your computer to the Docker image, so that those files will be available within the image when run.

For example, if we wanted to install Python we could put one line in our docker file as:

RUN apt-get python3

This would tell docker to execute the command apt-get python in the image, which would download and install python. If we needed to copy in some file into docker, and we wanted it to be in its own directory, we could do the following:

RUN mkdir -p /home/my\_code  
COPY /my\_local\_computer/Documents/Code/My\_Algorithm.py /home/my\_code

This creates the directory /home/my_code on your docker image, and then copies the file /my_local_computer/Documents/Code/My_Algorithm.py from you computer to the directory /home/my_code in your docker file. When you run this docker file, that code will always be there, at the same location.

In our case, we only need to install python. As you may or may not be aware, different operating systems have different ways of installing packages. For example, CentOS uses the command yum install, Ubuntu uses apt-get install, and alpine uses apk add. While it helps to be familiar with the environment you’re using, usually the necessary information needed to successfully install a package can be found online.

If you don't know how to set up python in alpine, a simple google search for "how to install python in alpine" can help you. In fact, the first result is even about setting up python on alpine in a Docker image. Perfect! Here's what that result suggests we use to install python:

RUN apk add --update \  
    python3 \  
    python-dev \  
    py-pip \  
    build-base \  
  && pip install virtualenv \  
  && rm -rf /var/cache/apk/\*

Line by line, this command is performing the following actions:

Calling apk (alpine's native package manager/installer, similar to apt/yum) to update all packages presently installed.
Install python
Install a development version of python
Install pip
Install build-base
Use pip to install a python package
Clean up any install files no longer needed

You can simply copy and paste this into your docker file, however there are some things that are installed here that would go unused. For our purposes, we only need python and pip, so we can ignore lines 3 and 5. We also don't need pip to install virtualenv, however we DO need the Flywheel sdk package with the command.

pip install flywheel-sdk.

With this in mind, create a Dockerfile (named Dockerfile) in your gear directory and enter the following lines:

FROM alpine:latest              # This sets the image we start building from  
RUN apk add --update \          # first update  
   python3 \                    # Install Python  
   py-pip \                    # Install pip  
 && pip install flywheel-sdk \ # Use pip to install the flywheel SDK  
 && rm -rf /var/cache/apk/\*    # Cleanup install files  

ENV FLYWHEEL=/flywheel/v0       # Setup default flywheel/v0 directory  

RUN mkdir -p ${FLYWHEEL}        # Create that directory  
COPY run.py ${FLYWHEEL}/run.py  # Copy in our runscript into the docker image  

ENTRYPOINT ["python run.py"]    # Set an entrypoint

NOTE that the comments in this code snippet are not valid docker comments, and must be removed if you attempt to copy and paste this code.

FROM means, we start with this pre-made image.

RUN means we run this command on the image

COPY means we copy a file from our local directory on our machine INTO the docker image itself. Be careful with this, as copying in large files will make your docker image large.

adding this run command creates what Docker calls a layer on top of the alpine:latest image. In the future, any other code that also needs bash in alpine can directly use this layer, rather than run the installation command again. Essentially, each line starting with a docker command (RUN, COPY, ENV), creates a new Layer. This increases the complexity of your Docker image, and the time it takes to compile/build. Because of this, it's a good idea to use as few calls as possible. You can read more about images and layers in Docker's documentation.

As an aside on this issue, this is not to suggest that you combine your entire setup into one single RUN command with dozens of && symbols. Instead, it's better to break up each Layer into functional groups. For example, this Layer installs python and python libraries. If you also wanted MATLAB, it would be useful to perform that installation in a separate layer.

Compiling a new Docker Image

NOTE:This docker image isn't ready to be run yet. Please continue to the next section for instructions on how to run this docker file.

Because we're not using a precompiled docker image, we need to generate our own. This is done with the docker build command . For our purposes, this needs to only be a simple argument. First, navigate to the directory in which your gear is stored and open a terminal window there. We will then build this image with a name and a tag.

To tag the Docker image, we use the option -t, followed by a name and a tag. you can upload your image to your Dockerhub account by adding your account name in front of the image name, followed by a slash.

To add your Docker image to your Dockerhub account, use the following command::

docker build -t <dockerhub\_Accountname>/<gear\_name>:<gear\_tag> ./

Make sure to replace everything surrounded by "<>" with desired names and tags. The "./" indicates that we're building the Dockerfile that's present in our current directory.

For example:

docker build -t homer/alpine-python:0.1.0 ./

It's considered best practice to match your docker image tag with the gear version. Since we've versioned our gear 0.1.0, we should give our docker image the tag 0.1.0 as well. If we increment our manifest version, we'll increment our docker image version as well, even if we made no changes to the docker image. Likewise, if we make changes to our docker image, but not to the run code, we increment both the manifest version and the docker image version.

This image now exists on your computer, labeled homer/alpine-python:0.1.0. This will allow it to run locally on your machine, however to run it on Flywheel, you'll need to push this image to your Dockerhub. This is done simply with the call:

docker push homer/alpine-python:0.1.0

To view in Dockerhub, click Repositories.

These steps will need to be repeated any time a change is made to the Dockerfile. When you rebuild, you can either overwrite the previous image by using the same name and tag, or keep the previous image and increment the version in the tag.

NOTE:This docker image isn't ready to be run yet. Please continue to the next section for instructions on how to run this docker file.

The Image in the Manifest

There's one more thing we need to do. Remember that Image key under the custom tag in the manifest? We now need to set it to the image we're using. set the custom->gear_builder->image tag to alpine:latest:

"custom": {  
   "gear-builder": {  
      "category": "analysis",  
      "image": "homer/alpine-python:0.1.0"

You're now ready to test the gear, and your directory structure should now look like this:

GearTutorial  
|- run.py  
|- message.txt  
|- manifest.json  
|- Dockerfile

In the next section, we'll go through some gear running/debugging techniques for you to use.

Additional Resources

Part 1: Environment Setup

Part 2: Gear Intro

Part 2a: The Flywheel Environment

Part 2b: The Run Script and Logging

Part 2c: The Manifest

Part 2d: The Dockerfile

Part 2e: Testing/Debugging