Large offset support
Opened this issue · 1 comments
TCLB currently does not support cases in which overall offset overflows (32bit) integer for:
load_
functions (dynamic and static access of fields) anywhere wherenx*ny*nz*fields
is larger than2^31
pop_
functions (loading fields through densities) anywhere where
nx*ny*nz
is larger than2^31
This can be fixed by appropriate casting in offset calculations in LatticeAccess
, but care has to be taken to not slow down the performance by making unnecessary int64_t
operations.
Originally posted by @shkodm in #496 (comment)
[...] Some things still don't work as expected (also the same on master
branch). I run on 2 V100 on Bunya, each with 80GB GPUs, my case is large, so I split between 2.
I get:
Cumulative allocation of 63.GB)
and then
an illegal memory access was encountered in Lattice.hpp at line 279
The error is the same even if try I split between 3 GPUs (40GB each, so plenty of space even if there is some unaccounted memory)
@shkodm @TravisMitchell After some investigation, we should really think if we want to calculate 64bit indexes, as 64bit multiplication on CUDA is around 20x cost of 32bit multiplication.
@shkodm can you check the sizes in your case? nx,ny,nz, but also number of fields?