Large offset support

Question

Large offset support

Opened this issue 8 months ago · 1 comments

TCLB currently does not support cases in which overall offset overflows (32bit) integer for:

load_ functions (dynamic and static access of fields) anywhere where nx*ny*nz*fields is larger than 2^31
pop_ functions (loading fields through densities) anywhere where
nx*ny*nz is larger than 2^31

This can be fixed by appropriate casting in offset calculations in LatticeAccess, but care has to be taken to not slow down the performance by making unnecessary int64_t operations.

Originally posted by @shkodm in #496 (comment)
[...] Some things still don't work as expected (also the same on master branch). I run on 2 V100 on Bunya, each with 80GB GPUs, my case is large, so I split between 2.
I get:
Cumulative allocation of 63.GB)
and then
an illegal memory access was encountered in Lattice.hpp at line 279

The error is the same even if try I split between 3 GPUs (40GB each, so plenty of space even if there is some unaccounted memory)

Answer 1 · 2024-01-17T08:20:13.000Z

@shkodm @TravisMitchell After some investigation, we should really think if we want to calculate 64bit indexes, as 64bit multiplication on CUDA is around 20x cost of 32bit multiplication.

@shkodm can you check the sizes in your case? nx,ny,nz, but also number of fields?