snuspl/nimble

How to build the docker file? fatal: Not a git repository (or any of the parent directories): .git

Closed this issue · 10 comments

I am running the following command in docker/pytorch folder to build the docker image for nimble
docker build -t nimble:latest .
I am getting the following error

Step 8/13 : COPY . .
 ---> Using cache
 ---> 9b4c39717094
Step 9/13 : RUN git submodule sync && git submodule update --init --recursive
 ---> Running in 08f892ce7bb6
fatal: Not a git repository (or any of the parent directories): .git
The command '/bin/sh -c git submodule sync && git submodule update --init --recursive' returned a non-zero code: 128

Getting this error when I try to build the docker:
DOCKER_BUILDKIT=1 docker build .
error:

#23 949.8 caffe2/CMakeFiles/torch_cpu.dir/build.make:21050: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/autograd/engine.cpp.o' failed
#23 970.0 CMakeFiles/Makefile2:8996: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/all' failed
#23 970.0 Makefile:140: recipe for target 'all' failed
#23 970.0 Building wheel torch-1.7.0a0
#23 970.0 -- Building version 1.7.0a0
#23 970.0 cmake -DBUILD_PYTHON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/opt/pytorch/torch -DCMAKE_PREFIX_PATH=/opt/conda/bin/../ -DNUMPY_INCLUDE_DIR=/opt/conda/lib/python3.9/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/opt/conda/bin/python -DPYTHON_INCLUDE_DIR=/opt/conda/include/python3.9 -DPYTHON_LIBRARY=/opt/conda/lib/libpython3.9.a -DTORCH_BUILD_VERSION=1.7.0a0 -DUSE_NUMPY=True /opt/pytorch
#23 970.0 cmake --build . --target install --config Release -- -j 8
------
executor failed running [/bin/sh -c TORCH_CUDA_ARCH_LIST="3.5 5.2 6.0 6.1 7.0+PTX 8.0" TORCH_NVCC_FLAGS="-Xfatbin -compress-all"     CMAKE_PREFIX_PATH="$(dirname $(which conda))/../"     python setup.py install]: exit code: 1


Unfortunately, this error message does not give any useful information about the reason why the build failed.

I haven't tried building a docker image for Nimble yet. It "should" work if we follow the official instruction from PyTorch (link).

I just started running the docker build command (make -f docker.Makefile, which is specified in the link above) on my environment. I will let you know once the build completes. In the meanwhile, you can also retry the build yourself by running the same command (make -f docker.Makefile).

I just pushed a commit (bac6d10) that makes a small change on the Dockerfile.
And I confirmed that the docker build command (make -f docker.Makefile) works well.
Please let me know if you still have a problem.

Getting the following error after cloning the nimble repo and running the following command:
make -f docker.Makefile

docker.Makefile:7: WARNING: No docker user found using results from whoami
fatal: No names found, cannot describe anything.
fatal: No names found, cannot describe anything.
fatal: No names found, cannot describe anything.
fatal: No names found, cannot describe anything.
DOCKER_BUILDKIT=1 docker build --progress=auto --target dev -t docker.io/umair/pytorch:-devel --build-arg BASE_IMAGE=nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04 --build-arg PYTHON_VERSION=3.7 --build-arg INSTALL_CHANNEL=pytorch .
invalid argument "docker.io/umair/pytorch:-devel" for "-t, --tag" flag: invalid reference format
See 'docker build --help'.
docker.Makefile:32: recipe for target 'devel-image' failed
make: *** [devel-image] Error 125

Looks like this error is not related to Nimble.
Can you try the same thing with the original PyTorch repo (with v1.7.1) to check if there is any problem with your environment?

I built the docker image using the following command. I had assumed the docker image would contain nimble pre-installed but it does not.
DOCKER_BUILDKIT=1 docker build --progress=auto --target dev -t docker.io/umair/pytorch:devel --build-arg BASE_IMAGE=nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04 --build-arg PYTHON_VERSION=3.7 --build-arg INSTALL_CHANNEL=pytorch .

Can you run the following python script on the docker container (run the docker image that you've built)?

import torch
print(torch.cuda.nimble)

I'm sorry. It is working. got the following output
<module 'torch.cuda.nimble' from '/opt/conda/lib/python3.9/site-packages/torch/cuda/nimble.py'>