/PASIf

Implementation of a forward Runge-Kutta 4 solver and a backward adjoin gradient method in CUDA

Primary LanguageCuda

PASIf

1. Compilation

To compile the code using the Makefile you'll need to fulfill the following requirments:

  • build-essential package with a gcc version <= 11.x

  • python-dev package, make sure to match the dev package version with your python version. (3.10 have been used during the development)

  • PASIf use pybind11 (https://github.com/pybind/pybind11) to link the C++/CUDA code as a python module. Make sure after cloning the project that the submodule have been downloaded as well:

      git submodule init
      git submodule update
    

    Make sure in the Makefile to make the PYBIND11 macro to match you'r installed version.

  • To compile the CUDA code you'll need the Nvidia compiler nvcc. This come in the NVIDIA HPC SDK: https://developer.nvidia.com/hpc-sdk
    This is installed by default in /opt/nvidia/, make sure to add the path to the compiler in your Linux Path:

      export PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/*version*/compilers/bin:$PATH
    

    Then make sure to match the NVCCINCLUDE macro in the Makefile with your installed version of the HPC SDK.

  • To compile, just run:

      mkdir build
      make
    

Single vs Double precision

To switch the precision, change float to double in /src/helpers.cuh

Other comments on the Makefile

Depending on the hardware targeted during the compilation make sure that the -arch flag match the architecture of your hardware. This to get the best performances.

The compilation process will generate a .so python module in the ./build folder.

2. Benchmarking

The tools to benchmark Nvidia GPU kernels are Nsight Compute and Nsight systems. They come with the NVIDIA HPC SDK but you can also directly donwload the more recent version here: https://developer.nvidia.com/gameworksdownload#?dn=nsight-systems-2023-2

Command line tools

    # Run the code with NSYS event capture
    > nsys profile python3 FullInterfaceTesting.py

    # You can open the .nsys-rep output using the graphical interface (nsys-ui) or export it in a .sqlite format and open the result directly in the terminal
    > nsys export --output=rep_name.sqlite -t sqlite report.nsys-rep

    # Open the benchmark in the terminal
    > nsys stats *.sqlite