una-dinosauria/Rayuela.jl

CUDA out of memory issue

dryman opened this issue · 8 comments

Sorry to bother again.
What is the minimal memory requirement for the GPU?

Creating 500000 random states... done in 4.35 seconds
ERROR: LoadError: CUDA error: out of memory (code #2, ERROR_OUT_OF_MEMORY)
Stacktrace:
 [1] macro expansion at /usr/local/google/home/fchern/.julia/packages/CUDAdrv/LC5XS/src/base.jl:147 [inlined]
 [2] #alloc#3(::CUDAdrv.Mem.CUmem_attach, ::Function, ::Int64, ::Bool) at /usr/local/google/home/fchern/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:161
 [3] alloc at /usr/local/google/home/fchern/.julia/packages/CUDAdrv/LC5XS/src/memory.jl:157 [inlined] (repeats 2 times)
 [4] (::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}})() at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/memory.jl:251
 [5] lock(::getfield(CuArrays, Symbol("##17#18")){Base.RefValue{CUDAdrv.Mem.Buffer}}, ::ReentrantLock) at ./lock.jl:101
 [6] macro expansion at ./util.jl:213 [inlined]
 [7] alloc(::Int64) at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/memory.jl:221
 [8] CuArrays.CuArray{Float32,2}(::Tuple{Int64,Int64}) at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/array.jl:45
 [9] similar at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/array.jl:61 [inlined]
 [10] gemm at /usr/local/google/home/fchern/.julia/packages/CuArrays/f4Eke/src/blas/wrap.jl:903 [inlined]
 [11] encode_icm_cuda_single(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Int64,1}, ::Int64, ::Int64, ::Bool, ::Bool) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/src/LSQ_GPU.jl:71
 [12] encode_icm_cuda(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Int64,1}, ::Int64, ::Int64, ::Bool, ::Int64, ::Bool) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/src/LSQ_GPU.jl:249
 [13] experiment_lsq_cuda(::Array{Float32,2}, ::Array{Int16,2}, ::Array{Array{Float32,2},1}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{Float32,2}, ::Array{UInt32,1}, ::Int64, ::Int64, ::Int64, ::Int64, ::Int64, ::Bool, ::Int64, ::Int64, ::Int64, ::Int64, ::Bool) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/src/LSQ_GPU.jl:352
 [14] run_demos(::String, ::Int64, ::Int64, ::Int64, ::Int64) at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/demos/demos_train_query_base.jl:72
 [15] top-level scope at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/demos/demos_train_query_base.jl:171 [inlined]
 [16] top-level scope at ./none:0
 [17] include at ./boot.jl:317 [inlined]
 [18] include_relative(::Module, ::String) at ./loading.jl:1038
 [19] include(::Module, ::String) at ./sysimg.jl:29
 [20] include(::String) at ./client.jl:398
 [21] top-level scope at none:0
in expression starting at /usr/local/google/home/fchern/.julia/environments/v0.7/dev/Rayuela/demos/demos_train_query_base.jl:170

From our README:

Requirements
This package is written in Julia 1.0, with some extension in C++ and CUDA. You also need a CUDA-ready GPU. We have tested this code on an Nvidia Titan Xp GPU.

Our CUDA GPU is having 8GB and we thought that was enough.

You could try increasing the number of splits (ie, how many chunks the data is split into before passing it to the GPU) to reduce the GPU memory requirement.

(sorry, a bit hardcoded for now).

nsplits_train = m <= 8 ? 1 : 1
nsplits_base = m <= 8 ? 2 : 4

Cool. Setting it as follows seems working for 8GB

nsplits_train =  2
nsplits_base  =  4

I'm glad it's working. Was this the reason behind issue #38?

I restarted julia and wasn't able to reproduce #38

Turns out fixing partition size doesn't solve the issue.
CuArrays are not freed.
I saw the memory keep increasing and then it goes out of memory again.
https://discourse.julialang.org/t/freeing-memory-in-the-gpu-with-cudadrv-cudanative-cuarrays/10946/8

Calling GC.gc() doesn't free the underlying CUDA memory. Any clues?

Yes, this is definitely an open issue. The julia GC is a bit of a black box to me, so I never really figured out how to fix this (other than using a larger GPU, which happens to have enough memory for GC to kick in just in time...)

I know this is less than ideal. It might be worth trying out calling CuArray's unsafe_free! function to alleviate the issue.

https://github.com/JuliaGPU/CuArrays.jl/blob/9892999533fa4c234516d777c0978576b3b3ff39/src/array.jl#L26-L32

But I'm sorry I can't provide a better fix.