Docker image for development & inference
KarimJedda opened this issue · 4 comments
I'm running this right now in a docker container on a hosted GPU provider. It should theoretically be possible to build a docker container that would encapsulate all the dependencies and have it runnable.
The issue however is that this provider doesn't give me access to the host VM in such a way that I can push the docker image to a container registry for convenience. I tried building it locally but the build also requires a GPU.
The idea here would be to split the image in two stages, a builder and a runner as such:
Dockerfile
FROM nvidia/cuda:12.2.0-devel-ubuntu20.04 AS builder
# Set working directory
WORKDIR /workdir
# Install dependencies
# we could pin them to specific versions to be extra sure
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
git \
python3-dev \
libtbb-dev \
libeigen3-dev \
unzip \
g++ \
libssl-dev \
build-essential \
checkinstall \
wget \
&& rm -rf /var/lib/apt/lists/*
# Install cmake 3.22
RUN wget https://github.com/Kitware/CMake/releases/download/v3.22.0/cmake-3.22.0.tar.gz \
&& tar -zvxf cmake-3.22.0.tar.gz \
&& cd cmake-3.22.0 \
&& ./bootstrap \
&& make -j8 \
&& checkinstall --pkgname=cmake --pkgversion="3.20-custom" --default
# Copy contents from 2 levels up
COPY . ./
# Download and extract libtorch
RUN wget https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.0.1%2Bcu118.zip \
&& unzip libtorch-cxx11-abi-shared-with-deps-2.0.1+cu118.zip -d external/
# Build (on CPU, this will add compute_35 as build target, which we do not want)
ENV PATH /usr/local/cuda-12.2/bin:$PATH
ENV LD_LIBRARY_PATH /usr/local/cuda-12.2/lib64:$LD_LIBRARY_PATH
RUN cmake -B build -D CMAKE_BUILD_TYPE=Release -D CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.2/ -D CUDA_VERSION=12.2 \
&& cmake --build build -- -j8
# --- Runner Stage ---
FROM nvidia/cuda:12.2.0-devel-ubuntu20.04 AS runner
WORKDIR /app
# Copy built artifact from builder stage
COPY --from=builder /workdir/build /app/build
# Subject to change
CMD ["./build/gaussian-splatting-cuda"]
I believe this would simplify both development and inference.
For building:
DOCKER_BUILDKIT=1 docker build -t gaussplat -f Dockerfile
and for running something along the lines of (subject to tweaking):
docker -v /tmp/dataset:/dataset -v /tmp/output:/output run gaussiansplat:0.0.1 /dataset/tandt/truck
For now I'm putting this here until my GPUs and computer parts get delivered and I can try it in a more controlled environment. Until then, this could be a good first issue.
One small note if anyone attempts this.
This seems to be required regardless. I do not know how the compute_35
dependencies gets in the CMake files:
sed -i 's/-gencode arch=compute_35,code=sm_35//g' /gaussian-splatting-cuda/build/external/CMakeFiles/simple-knn.dir/flags.make
sed -i 's/-gencode arch=compute_35,code=sm_35//g' /gaussian-splatting-cuda/build/CMakeFiles/testing.dir/flags.make
sed -i 's/-gencode arch=compute_35,code=sm_35//g' /gaussian-splatting-cuda/build/CMakeFiles/gaussian_splatting_cuda.dir/flags.make
Doing so lets you build properly on a "factory reset" machine.
That sounds like a great plan. Having the software run in the cloud appears highly beneficial to me. Incorporating a Docker file isn't a massive addition, but it can deliver immediate value. So if you're inclined to take on this task, I wholeheartedly support you.
On the architecture front, it appears the source might be libtorch. However, I'm curious about how this is integrated into your cmake build. I haven't noticed this occurrence in my builds.
In the mid term, our objective should be to entirely phase out libtorch as a dependency. This move would address such issues. My inclination is to retain it primarily for test writing, facilitating easier output verification. Such a feature is invaluable when adjusting tensors and applying optimization routines.
The midterm goal is to completely remove libtorch as dependency. This would alleviate this issue. I probably want to keep it only for writing tests, so that outputs can be more easily verified. That helps tremendously when you tweak tensors and apply optimization routines.
I think this can be closed after one year :)
Sounds good to me, what a journey :)