Smaller Docker Image using Multi-Stage Build

Example: CUDA-enabled PyTorch + Apex Image

Jun 21, 2019 · 793 words · 4 minute read deep_learning pytorch docker

Photo Credit

Photo Credit

Why Use Mutli-Stage Build?

Starting from Docker 17.05, users can utilize this new “multi-stage build” feature [1] to simplify their workflow and make the final Docker images smaller. It basically streamlines the “Builder pattern”, which means using a “builder” image to build the binary files, and copying those binary files to another runtime/production image.

Despite being an interpreted programming language, many of Python libraries, especially the ones doing scientific computing and machine learning, are built upon pieces written in compiled languages (mostly C/C++). Therefore, the “Builder pattern” can still be applied.

(The following sections assume that you’ve already read the multi-stage build documentation [1].)

CUDA-enabled Docker images

We can build CUDA-enabled Docker images using nvidia-docker[2]. These types of images can potentially benefit hugely from multi-stage build, because (a) their sizes are quite big (usually multiple GBs) and (b) many packages that use CUDA requires CUDA library to build, but it is only useful in build time.

We are going demonstrate the power of multi-stage build by building a Docker image that has PyTorch 1.1 and NVIDIA Apex[3] installed. Here’s a sneak peek of the build image and the final runtime image:

The sizes of the “build” image and the final “runtime” image.

The sizes of the “build” image and the final “runtime” image.

The image size has shrunk from 6.88 GB to 3.82 GB (~45% reduction).

Building an PyTorch + NVIDIA Apex Image

The complete Dockerfile used can be found here on Github.


We are going to use nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04[4] as the base image. Note that the devel part is required for the optimal NVIDIA Apex build settings. This image already takes 3.1 GB of space.

I personally prefer to use miniconda in Docker images because it’s the easiest way to install PyTorch and you get to choose the version of your Python interpreter easily. The following will install miniconda into /opt/conda:

FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04 AS build

ARG CONDA_DIR=/opt/conda

# Instal basic utilities
RUN apt-get update && \
    apt-get install -y --no-install-recommends git wget unzip bzip2 build-essential ca-certificates && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Install miniconda
RUN wget --quiet$ -O /tmp/ && \
    echo 'export PATH=$CONDA_DIR/bin:$PATH' > /etc/profile.d/ && \
    /bin/bash /tmp/ -b -p $CONDA_DIR && \
    rm -rf /tmp/* && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

Then we install the desired version of Python interpreter, PyTorch from the official instructions, and some other essential scientific computing libraries.

RUN conda install -y python=$PYTHON_VERSION && \
    conda install -y pytorch torchvision cudatoolkit=10.0 -c pytorch && \
    conda install -y h5py scikit-learn matplotlib seaborn \
    pandas mkl-service cython && \
    conda clean -tipsy

Then we install NVIDIA Apex:

# Install apex
RUN git clone && \
    cd apex && pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .

Finally, you can install some packages that are not available on conda via pip command.


When you install PyTorch via conda, an independent CUDA binaries are installed. That enables us to use nvidia/cuda:10.0-base as the base image. This bare-bone image only uses 115 MB.

Here we create a default user, which can improve security, and copy the whole /opt/conda directory into the new runtime image.

FROM nvidia/cuda:10.0-base

ARG CONDA_DIR=/opt/conda


# Create the user
RUN useradd --create-home -s /bin/bash --no-user-group -u $USERID $USERNAME && \
    chown $USERNAME $CONDA_DIR -R && \
    adduser $USERNAME sudo && \
    echo "$USERNAME ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers

COPY --chown=1000 --from=build /opt/conda/. $CONDA_DIR

The new FROM marks the start of the second stage.

Unfortunately, it seems the --chown=1000 has to be hard-coded. Docker cannot handle --chown=$USERID. This is a minor inconvenience.

Building the Dockerfile

Just like any other Dockerfile, build it with something like docker build -t apex --force-rm ..

You can use --target to control the target stage. This is how I obtain the size of the builder image: docker build --target build -t build-tmp --force-rm ..

Side notes

Copying /opt/conda and setting the $PATH environment variable might not be the perfect way to migrate the conda installation, but it works well enough for me.

If you intend to have multiple conda environment in one Docker image, this image will fail to execute conda activate. You may need to run conda init bash for it to work.

Most, if not all, of the reduction in image size come from switching base image. I’m not sure if there are other ways to squeeze more space from the /opt/conda folder. If you do, please let me know in the comment section.


  1. Use multi-stage builds | Docker documentation.
  2. Docker + NVIDIA GPU = nvidia-docker | The Artificial Impostor
  3. NVIDIA/apex - A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
  4. nvidia/cuda | Docker Hub
tweet Share