A CUDA + batoid + GalSim wrapper for: https://www.deshawresearch.com/resources_random123.html
This coordinates the RNGs over all threads to generate random numbers that are consistent between host and device.
Testing
Before any push:
make cleanall
make
./testAll.sh #(on Cori gpu)
TODO
- Clean up code (use Batoid style)
- Add cmake
- Test with LLVM
- Compare new perf + test case with generation on GPU and memCpy
DeviceToHost.
- Debug
n_streams
<buf_size
- Debug
- Pass GalSim random tests: GalSim/tests/test_random.py
- make a shared lib of Conus (pybind)
- call conus from python tests.
Next steps:
Work with josh to include conus into batoid
- Figure out how to reproduce the virtual function mechanism cf: https://gist.github.com/jmeyers314/986ac7670b356eed32f2fecf2b55aa18
- Make the result not depend on number of threads: i.e. work out a “unique” ID for each photon
Profiling using NSIGHT:
- Run:
module load cuda
srun nsys profile ./test_gpu_nvtx.ex BUFSIZE
- Open it with nsight-sys (on NoMachine for example)