google/FluidNet

CUDA compute capability or CUDA version requirement?

Closed this issue · 3 comments

When running qlua fluid_net_train.lua -gpu 1 -dataset output_current_model_sphere -modelFilename myModel I get:

Try 'sleep --help' for more information.
sleep: invalid time interval ‘0,001’
Try 'sleep --help' for more information.
sleep: invalid time interval ‘0,001’========================>.]  319/320 
Try 'sleep --help' for more information.
sleep: invalid time interval ‘0,001’
Try 'sleep --help' for more information.
 [===========================================================>]  320/320 
sleep: invalid time interval ‘0,001’
Try 'sleep --help' for more information.
sleep: invalid time interval ‘0,001’
Try 'sleep --help' for more information.
==> Loaded 20480 samples
==> Creating model...
Number of input channels: 3
Model type: default
Bank 1:
Adding convolution: cudnn.SpatialConvolution(3 -> 16, 3x3, 1,1, 1,1)
Adding non-linearity: nn.ReLU (inplace true)
Bank 1:
Adding convolution: cudnn.SpatialConvolution(16 -> 16, 3x3, 1,1, 1,1)
Adding non-linearity: nn.ReLU (inplace true)
Bank 1:
Adding convolution: cudnn.SpatialConvolution(16 -> 16, 3x3, 1,1, 1,1)
Adding non-linearity: nn.ReLU (inplace true)
Bank 1:
Adding convolution: cudnn.SpatialConvolution(16 -> 16, 3x3, 1,1, 1,1)
Adding non-linearity: nn.ReLU (inplace true)
Adding convolution: cudnn.SpatialConvolution(16 -> 1, 1x1)
==> defining loss function
    using criterion nn.FluidCriterion: pLambda=0,00, uLambda=0,00, divLambda=1,00, borderWeight=1,0, borderWidth=3
==> Extracting model parameters
==> Defining Optimizer
    Using ADAM...
==> Profiling FPROP for 10 seconds with grid res 128
THCudaCheck FAIL file=/home/torstein/progs/FluidNet/torch/tfluids/generic/tfluids.cu line=119 error=8 : invalid device function
qlua: /home/torstein/torch/install/share/lua/5.1/tfluids/init.lua:516: cuda runtime error (8) : invalid device function at /home/torstein/progs/FluidNet/torch/tfluids/generic/tfluids.cu:119
stack traceback:
	[C]: at 0x7fdd9f648f50
	[C]: in function 'emptyDomain'
	/home/torstein/torch/install/share/lua/5.1/tfluids/init.lua:516: in function 'emptyDomain'
	fluid_net_train.lua:145: in main chunk

Using Nvidia GTX 770 with 367.57 drivers and 7.5.17 CUDA. Here's an overview over CUDA functions and required compute capability. The GPU in question has compute capability 3.0.

Here's the output from running './test.sh' in torch:
torch test.txt

Sorry you're running into this.

Unfortunately, I haven't seen this before, but it's very likely a compute comparability problem. Try changing line 21 of FluidNet/torch/tfluids/CMakeLists.txt from:

LIST(APPEND CUDA_NVCC_FLAGS "-arch=sm_35;--use_fast_math; -D_FORCE_INLINES")

to

LIST(APPEND CUDA_NVCC_FLAGS "-arch=sm_30;--use_fast_math; -D_FORCE_INLINES")

I don't recall using any SM 3.5 specific features, so tfluids should compile and run with SM 3.0 as well. Let me know how that goes.

After I upgraded to CUDA V8.0.61 with driver 375.26, recompiled torch (first clean.sh in torch dir), then deleted everything in FluidNet/torch/tfluids/build, then recompiled tfluids with sm_30 it worked! I doubt any but last two steps was necessary but in any case issue solved.

Oh that's great! Thanks for the update.