Lab41/attalos

Torch docker file fails to build

dooleys opened this issue · 9 comments

$ docker build -t l41-torch -f Dockerfile.torch .
Sending build context to Docker daemon 11.37 MB
Step 1 : FROM l41-nvidia-base
 ---> 102ed6e3da5e
Step 2 : RUN apt-get update && apt-get install -y   git   software-properties-common   ipython3   libssl-dev   libzmq3-dev   python-zmq   python-pip   libhdf5-serial-dev   hdf5-tools
 ---> Using cache
 ---> df2ebaed3e7f
Step 3 : RUN pip install notebook ipywidgets
 ---> Using cache
 ---> 54a7b6d785a0
Step 4 : RUN git clone https://github.com/torch/distro.git /root/torch --recursive && cd /root/torch &&   bash install-deps
 ---> Using cache
 ---> 2c31f034c9ea
Step 5 : RUN cd /root/torch &&   ./install.sh
 ---> Using cache
 ---> d9cab405fe56
Step 6 : ENV LUA_PATH '/root/.luarocks/share/lua/5.1/?.lua;/root/.luarocks/share/lua/5.1/?/init.lua;/root/torch/install/share/lua/5.1/?.lua;/root/torch/install/share/lua/5.1/?/init.lua;./?.lua;/root/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua'
 ---> Using cache
 ---> 29962930c41b
Step 7 : ENV LUA_CPATH '/root/.luarocks/lib/lua/5.1/?.so;/root/torch/install/lib/lua/5.1/?.so;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so'
 ---> Using cache
 ---> 8c5019c48b20
Step 8 : ENV PATH /root/torch/install/bin:$PATH
 ---> Using cache
 ---> 5a56345a7f64
Step 9 : ENV LD_LIBRARY_PATH /root/torch/install/lib:$LD_LIBRARY_PATH
 ---> Using cache
 ---> 40d3a95ded7c
Step 10 : ENV DYLD_LIBRARY_PATH /root/torch/install/lib:$DYLD_LIBRARY_PATH
 ---> Using cache
 ---> ede651e44086
Step 11 : ENV LUA_CPATH '/root/torch/install/lib/?.so;'$LUA_CPATH
 ---> Using cache
 ---> 417b63eecd7a
Step 12 : RUN luarocks install https://raw.githubusercontent.com/soumith/cudnn.torch/R4/cudnn-scm-1.rockspec
 ---> Running in 9d69614736f6

Missing dependencies for cudnn:
cutorch 

Cloning into 'cutorch'...
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Torch7 in /root/torch/install
CMake Error at /usr/share/cmake-2.8/Modules/FindCUDA.cmake:548 (message):
  Specify CUDA_TOOLKIT_ROOT_DIR
Call Stack (most recent call first):
  CMakeLists.txt:7 (FIND_PACKAGE)


-- Configuring incomplete, errors occurred!
See also "/tmp/luarocks_cutorch-scm-1-9441/cutorch/build/CMakeFiles/CMakeOutput.log".

Error: Failed installing dependency: https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec - Build error: Failed building.
Using https://raw.githubusercontent.com/soumith/cudnn.torch/R4/cudnn-scm-1.rockspec... switching to 'build' mode
Using https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec... switching to 'build' mode
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/root/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/root/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j$(getconf _NPROCESSORS_ONLN) install

The command '/bin/sh -c luarocks install https://raw.githubusercontent.com/soumith/cudnn.torch/R4/cudnn-scm-1.rockspec' returned a non-zero code: 1
agude commented

I can't reproduce this, even on the same hardware, after applying #73.

Step 24 : RUN luarocks install cutorch
 ---> Running in afb7f68c968a
Cloning into 'cutorch'...
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Torch7 in /root/torch/install
CMake Error at /root/torch/install/share/cmake/torch/FindCUDA.cmake:617 (message):
  Specify CUDA_TOOLKIT_ROOT_DIR
Call Stack (most recent call first):
  CMakeLists.txt:7 (FIND_PACKAGE)


-- Configuring incomplete, errors occurred!
See also "/tmp/luarocks_cutorch-scm-1-2022/cutorch/build/CMakeFiles/CMakeOutput.log".

Error: Build error: Failed building.
Installing https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec...
Using https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec... switching to 'build' mode
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/root/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/root/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j$(getconf _NPROCESSORS_ONLN) install

The command '/bin/sh -c luarocks install cutorch' returned a non-zero code: 1
make: *** [torch] Error 1```

Do you want to automatically prepend the Torch install location
to PATH and LD_LIBRARY_PATH in your /root/.bashrc? (yes/no)
[yes] >>>
---> d175383f263c
Removing intermediate container 8bf0749a2bb4
Step 6 : ENV LUA_PATH '/root/.luarocks/share/lua/5.1/?.lua;/root/.luarocks/share/lua/5.1/?/init.lua;/root/torch/install/share/lua/5.1/?.lua;/root/torch/install/share/lua/5.1/?/init.lua;./?.lua;/root/torch/install/share/luajit-2.1.0-beta1/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/local/share/lua/5.1/?/init.lua'
---> Running in 2d10eb602630
---> cc86a12e4566
Removing intermediate container 2d10eb602630
Step 7 : ENV LUA_CPATH '/root/.luarocks/lib/lua/5.1/?.so;/root/torch/install/lib/lua/5.1/?.so;./?.so;/usr/local/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so'
---> Running in 5665d532e1ef
---> fd9c6ef1ada4
Removing intermediate container 5665d532e1ef
Step 8 : ENV PATH /root/torch/install/bin:$PATH
---> Running in 13a84026bae1
---> 7170c8cc90ee
Removing intermediate container 13a84026bae1
Step 9 : ENV LD_LIBRARY_PATH /root/torch/install/lib:$LD_LIBRARY_PATH
---> Running in 410366ab8afd
---> 2d2fb0789232
Removing intermediate container 410366ab8afd
Step 10 : ENV DYLD_LIBRARY_PATH /root/torch/install/lib:$DYLD_LIBRARY_PATH
---> Running in 033ee2b23409
---> e022e6d160db
Removing intermediate container 033ee2b23409
Step 11 : ENV LUA_CPATH '/root/torch/install/lib/?.so;'$LUA_CPATH
---> Running in 271d3f4784e1
---> 79166eb8cc29
Removing intermediate container 271d3f4784e1
Step 12 : RUN luarocks install https://raw.githubusercontent.com/soumith/cudnn.torch/R4/cudnn-scm-1.rockspec
---> Running in 0b645c2b7ce1

Missing dependencies for cudnn:
cutorch

Cloning into 'cutorch'...
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found Torch7 in /root/torch/install
CMake Error at /usr/share/cmake-2.8/Modules/FindCUDA.cmake:548 (message):
Specify CUDA_TOOLKIT_ROOT_DIR
Call Stack (most recent call first):
CMakeLists.txt:7 (FIND_PACKAGE)

-- Configuring incomplete, errors occurred!
See also "/tmp/luarocks_cutorch-scm-1-3327/cutorch/build/CMakeFiles/CMakeOutput.log".

Error: Failed installing dependency: https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec - Build error: Failed building.
Using https://raw.githubusercontent.com/soumith/cudnn.torch/R4/cudnn-scm-1.rockspec... switching to 'build' mode
Using https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec... switching to 'build' mode
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/root/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/root/torch/install/lib/luarocks/rocks/cutorch/scm-1" && make -j$(getconf _NPROCESSORS_ONLN) install

The command '/bin/sh -c luarocks install https://raw.githubusercontent.com/soumith/cudnn.torch/R4/cudnn-scm-1.rockspec' returned a non-zero code: 1
make: *** [build] Error 1

ok I agree with Alex on this - luarocks sucks... but my bet is that it's not necessarily luarocks' fault. please see attached file - luarocks seems to complain at various stages during the build.

attalos_build_0444Z.txt

is it this? got it from the file i attached on an earlier comment

Step 20 : ENV CUDNN_VERSION 4
---> Using cache
---> 7f130d7c6898
Step 21 : LABEL com.nvidia.cudnn.version "4"
---> Using cache
---> 5a91192d13e2
Step 22 : ENV CUDNN_PKG_VERSION 4.0.7
---> Using cache
---> f95a62bae981
Step 23 : RUN echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1404/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list
---> Using cache
---> 12a0b24f083c
Step 24 : RUN apt-get update && apt-get install -y --no-install-recommends --force-yes libcudnn4=$CUDNN_PKG_VERSION libcudnn4-dev=$CUDNN_PKG_VERSION && rm -rf /var/lib/apt/lists/*
---> Using cache
---> 0e1a028cb68a

At least for the error I got what was missing was:

ENV CUDA_BIN_PATH=/usr/local/cuda-7.5

before the "luarocks install cutorch" line

agude commented

I can reproduce @ymt123's bug. I'm trying the CUDA_BIN_PATH change now.

agude commented

@ymt123's suggestion fixed it! See #74 .