Docker Tricks And Tips
Please see an adapted version of this article published on Opensource.com
here.
In a previous post, I covered an approach for dockerizing a software build system. In this article, I discuss some techniques I've found useful while iterating on a Dockerfile to get it just right. For example, if the Dockerfile involves downloading and installing a 5GB file, each iteration of "docker image build
" could take a lot of time even with good network speeds.
On the surface, creating a Dockerfile for a build system seems like a straightforward exercise: simply implement the same steps in a Dockerfile that you'd perform if you were installing the items directly. Unfortunately, I've found that it usually doesn't quite work that way, and a few "tricks" are handy for such DevOps exercises.
In the tutorial repository from the previous post, I've added a folder with an example covering some of these tricks which I'll walk through in this post.
Organize Build Tool I/O
The build inputs and outputs, and the scripts that configure and invoke the tools should be outside the image and the eventually running container. These inputs and outputs are best accessed by setting up docker volumes. I covered this extensively in a previous post but wanted to emphasize this as it's been a useful convention for my work.
Saving Time On Docker Image Build Iterations
Using a local HTTP server is useful to avoid downloading large files multiple times from the internet during "docker image build
" iterations. To illustrate this by example, let's say we need to create a docker image with Anaconda 3 under Ubuntu 18.04. The Anaconda 3 installer is a ~0.5GB file, so I'll use this as our "large" file for this example.
Note that I don't want to use the docker COPY
instruction as it creates a new layer. I want to delete the large installer after using it to minimize the docker image size. One could use multi-stage builds, but I've found this approach sufficient and quite effective.
The basic idea is to use a Python-based HTTP server locally to serve the large file(s) and have the Dockerfile wget
the large file(s) from this local server. Let's explore the details of how to set this up effectively. The full example is provided here.
The necessary contents of the folder tutorial2_docker_tricks/
in this example repository are outlined here:
tutorial2_docker_tricks/
├── build_docker_image.sh # builds the docker image
├── run_container.sh # instantiates a container from the image
├── install_anaconda.dockerfile # Dockerfile for creating our target docker image
├── .dockerignore # used to ignore contents of the installer/ folder from the docker context
├── installer # folder with all our large files required for creating the docker image
│ └── Anaconda3-2019.10-Linux-x86_64.sh # from https://repo.anaconda.com/archive/Anaconda3-2019.10-Linux-x86_64.sh
└── workdir # example folder used as a volume in the running container
The key steps of the approach are:
-
Place the large file(s) in the
installer/
folder. In this example, we have the large Anaconda installer fileAnaconda3-2019.10-Linux-x86_64.sh
. Note that you won't get this file if you clone my git repository. But you can download the installer from here to follow along with the example. Note that only you, as the docker image creator, needs this source file. The end users of the docker image don't. -
Create the
.dockerignore
file and have it ignore theinstaller/
folder to avoid Docker from copying all the large files into the build context. -
In a terminal,
cd
into thetutorial2_docker_tricks/
folder and execute the build script as "./build_docker_image.sh
". -
In
build_docker_image.sh
, we start the Python HTTP server to serve any files from theinstaller/
folder.
cd installer
python3 -m http.server --bind 10.0.2.15 8888 &
cd ..
-
If you're wondering about the strange IP address, I'm working with a VirtualBox Linux VM, and
10.0.2.15
shows up as the address of the Ethernet adapter when I runifconfig
. This IP seems to be the convention used by VirtualBox. If your setup is different, you'll need to update this IP address inbuild_docker_image.sh
andinstall_anaconda.dockerfile
appropriately. The server's port number is set to 8888 for this example. -
As the HTTP server is set to run in the background, I stop the server near the end of the script with the "
kill -9
" command using a cool approach I found here.
kill -9 `ps -ef | grep http.server | grep 8888 | awk '{print $2}'
-
You will note that I also have this same "
kill -9
" earlier in the script before starting the HTTP server. In general, when I iterate on any build script which I might deliberately interrupt, this ensures a clean start of the HTTP server each time. -
In the Dockerfile, there is a "
RUN wget
" instruction that downloads the Anaconda installer from the local HTTP server. It also deletes the installer file and cleans up after the installation all within the same layer to keep the image size to a minimum.
# install Anaconda by downloading the installer via the local http server
ARG ANACONDA
RUN wget --no-proxy http://10.0.2.15:8888/${ANACONDA} -O ~/anaconda.sh \
&& /bin/bash ~/anaconda.sh -b -p /opt/conda \
&& rm ~/anaconda.sh \
&& rm -fr /var/lib/apt/lists/{apt,dpkg,cache,log} /tmp/* /var/tmp/*
-
After the build is complete, you should see a docker image
anaconda_ubuntu1804:v1
present. (You can list the images with "docker image ls
"). -
You can instantiate a container from this image using
./run_container.sh
at the terminal while in the foldertutorial2_docker_tricks/
. You can verify that Anaconda is installed as follows:
$ ./run_container.sh
$ python --version
Python 3.7.5
$ conda --version
conda 4.8.0
$ anaconda --version
anaconda Command line client (version 1.7.2)
- You'll note that
run_container.sh
sets up a volumeworkdir
. In this example repository, the folderworkdir/
is empty. This is a convention I use to set up a volume where I can have my Python and other scripts that are independent of the docker image.
Non-Root User
An important aspect of I/O concerns the ownership of the build tool output files. By default, since Docker runs as root
, the output files would be owned by root
which is unpleasant. We typically want to work as a non-root
user. Changing the ownership after the build output is generated can be done with scripts but is an additional unnecessary step. It's best to set the USER
argument in the Dockerfile at the earliest point possible.
ARG USERNAME
# other commands...
USER ${USERNAME}
The USERNAME
can be passed in as a build argument (--build-arg
) when executing the "docker image build
". You can see an example of this in the example Dockerfile and corresponding build script.
Some portions of the tools may also need to be installed as a non-root
user. So the sequence of installations in the Dockerfile may need to be different from the way it's done if you were installing manually and directly under Linux.
Minimizing Image Size
Each RUN
command is equivalent to executing a new shell, and each RUN
command creates a layer. The naive approach of mimicking installation instructions with separate RUN
commands may eventually break at one or more steps, and will also result in a larger image. Chaining multiple installation steps in one RUN
command, and including the autoremove
, autoclean
and rm
commands as in the example below is useful to minimize the size of each layer.
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get -y --quiet --no-install-recommends install \
# list of packages being installed go here \
&& apt-get -y autoremove \
&& apt-get clean autoclean \
&& rm -fr /var/lib/apt/lists/{apt,dpkg,cache,log} /tmp/* /var/tmp/*
Besides this, ensure that you have a .dockerignore
file in place to ignore items that don't need to be sent to the Docker build context.
Non-Interactive Installation
I've found the DEBIAN_FRONTEND=noninteractive apt-get -y --quiet --no-install-recommends
options for the apt-get install
instruction (as in the example above) necessary to prevent the installer opening dialog boxes. Note that these options should be used as part of the RUN
instruction. The DEBIAN_FRONTEND=noninteractive
should not be set as an environment variable (ENV
) in the Dockerfile as explained here as it will be inherited by the containers.
Logging Build And Run Output
Save a typescript of everything that happened during the docker image build or container run session using a simple tee
in the bash scripts. In other words, just add "|& tee $BASH_SOURCE.log
" to the end of the "docker image build
" and the "docker image run
" commands in your scripts. See the examples in the image build and container run scripts.
What this tee
-ing technique does is generate a file with the same name as the bash script but with a ".log
" extension appended to it so that you know which script it originated from. Everything you see printed to the terminal when running the script will get logged to this file with a similar name.
This is especially valuable for users of your Docker images to report issues to you when something doesn't work. You can ask them to send you the log file to help diagnose the issue. Many tools generate so much output as to easily overwhelm the default size of the terminal's buffer. Relying only on the terminal's buffer capacity to copy-paste error messages may not be sufficient for diagnosing issues if the cause of the issue can only be seen much earlier in the output.
I've also found this to be useful even in the Docker image-building scripts especially when using the Python-based HTTP server discussed above. The server generates so many lines during a download that it typically overwhelms the terminal's buffer.
Default Shell Selection
The default shell assumed by Docker is sh
for Linux. I'm more familiar with bash
and so I have a tendency to override the default with bash
early on in the Dockerfile like this:
SHELL ["/bin/bash", "-c"]
I don't think there is a significant impact for the types of commands I use in Dockerfiles, so this is just precaution. The script invoked at the Dockerfile entrypoint should have the appropriate shebang anyway.
Comments
Comments powered by Disqus