anibali/docker-pytorch

Size to big: use multi stage docker files

Closed this issue · 4 comments

I able to crunch python3.10, torch=2.0.0, Cu=11.8 in just 2.5GB.

https://github.com/michaelfeil/infinity/blob/main/libs/infinity_emb/Dockerfile

Is this approach something you would consider?

I'm glad that you've found a setup that works well for you. I'm not sure how much benefit you are getting from having a multi-stage build vs. combining the poetry install and poetry cache clear into one RUN step, since it does not look like your builder has any additional package requirements.

The current approach taken by docker-pytorch is a result of many considerations (including the use of mamba/conda to make binary dependencies easier to manage). I have experimented with poetry a few times in the past and ran into issues as the list of dependencies got more complex. Thank you nonetheless, and I might look into the Python/Pip environment variables that you set near the top as these look like good settings for a dockerised environment.

Note: Please do not use poetry. I especially dont encourage it to use with torch.

This is about the old trick, of using two docker files to achieve a higher compression rate. In my case this led from a 8gb image to ~2.5GB. I would recommend copy the conda result / encironment over. This will have no effects, but e.g. the apt-get update is not required.

That is very interesting. Do you have some kind of link/reference describing in more detail how this works?

I experimented a bit and unfortunately I don't think multi-stage builds will help in this case. The Conda prefix is 7.2G, and no amount of Docker trickery will really be able to change that. Most of this space is occupied by PyTorch and its CUDA dependencies.

Here is what I tried, with no real benefit:

FROM nvidia/cuda:11.8.0-base-ubuntu22.04 as base

# Remove any third-party apt sources to avoid issues with expiring keys.
RUN rm -f /etc/apt/sources.list.d/*.list

# Install some basic utilities.
RUN apt-get update && apt-get install -y \
    curl \
    ca-certificates \
    sudo \
    git \
    bzip2 \
    libx11-6 \
 && rm -rf /var/lib/apt/lists/*

# Create a working directory.
RUN mkdir /app
WORKDIR /app

# Create a non-root user and switch to it.
RUN adduser --disabled-password --gecos '' --shell /bin/bash user \
 && chown -R user:user /app
RUN echo "user ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/90-user
USER user

# All users can use /home/user as their home directory.
ENV HOME=/home/user
RUN mkdir $HOME/.cache $HOME/.config \
 && chmod -R 777 $HOME

# Set up environment variables for Conda/Mamba
ENV MAMBA_EXE=/usr/local/bin/micromamba \
    MAMBA_ROOT_PREFIX=/home/user/micromamba \
    CONDA_PREFIX=/home/user/micromamba \
    PATH=/home/user/micromamba/bin:$PATH

#########

FROM base as builder

# Download and install Micromamba.
RUN curl -sL https://micro.mamba.pm/api/micromamba/linux-64/1.1.0 \
  | sudo tar -xvj -C /usr/local bin/micromamba

# Set up the base Conda environment by installing PyTorch and friends.
COPY conda-linux-64.lock /app/conda-linux-64.lock
RUN micromamba create -qy -n base -f /app/conda-linux-64.lock \
 && rm /app/conda-linux-64.lock \
 && micromamba shell init --shell=bash --prefix="$MAMBA_ROOT_PREFIX" \
 && micromamba clean -qya

# Fix for https://github.com/pytorch/pytorch/issues/97041
RUN ln -s "$CONDA_PREFIX/lib/libnvrtc.so.11.8.89" "$CONDA_PREFIX/lib/libnvrtc.so"

#########

FROM base as final

# Copy across the Conda prefix.
COPY --from=builder $CONDA_PREFIX $CONDA_PREFIX

# Set the default command to python3.
CMD ["python3"]