torch/cutorch

CUDA8.0 + cuDnn6.0 + GPU Volta architecture + torch7 build error

ytzhao opened this issue · 3 comments

Hi all, I tried to install torch7 on Tesla V100, but met some problems. For some reasons, I have to use CUDA8.0(although for Volta architecture, it's better to use CUDA9.0).

Building on 32 cores
-- Found Torch7 in /home/torch/install
-- Found CUDA: /usr/local/cuda-8.0 (found suitable version "8.0", minimum required is "6.5") 
-- Removing -DNDEBUG from compile flags
-- TH_LIBRARIES: TH
-- MAGMA not found. Compiling without MAGMA support
-- Autodetected CUDA architecture(s): 7.0 7.0 7.0 7.0
-- got cuda version 8.0
-- Found CUDA with FP16 support, compiling with torch.CudaHalfTensor
-- CUDA_NVCC_FLAGS: -gencode;arch=compute_70,code=sm_70;-DCUDA_HAS_FP16=1
-- THC_SO_VERSION: 0
-- Performing Test HAS_LUAL_SETFUNCS
-- Performing Test HAS_LUAL_SETFUNCS - Failed
-- Configuring done
-- Generating done
-- Build files have been written to: /home/torch/extra/cutorch
make: *** No rule to make target 'install'.  Stop.

I think the problem is the gpu architecture dismatch cuda version. (for Volta architecture, it officially needs CUDA9.0, it also works with CUDA8.0). How could I disable the auto-detection, and manually set it to other runnable version, such as "compute_60" or somehow? Thank you.

I am having this exact issue. Is there any fix you have found for this yet?

@stancil1 Hi, I change the CUDA version to 9.0, it somehow works.

I'm finding that the torch.cudnn package is not properly initializing the Volta GPUs (it takes about 10 minutes to configure the GPUs -- it should be nearly instantaneous). Were you able to configure GPUs properly using cutorch?

This is the code that cudnn is having trouble with:

https://github.com/soumith/cudnn.torch/blob/R7/init.lua