Test cases to validate the correctness of hpctoolkit for GPU-accelerated applications.
Clone
git clone --recursive git@github.com:Jokeren/hpctoolkit-gpu-samples.git
Setup
export OMP_NUM_THREADS=<#threads>
export HPCTOOLKIT_GPU_TEST_REP=<#repeat times>
Run
cd <sample path>
make ARCH=<GPU arch>
./<application name> <device id (default 0)>
Case | Purpose |
---|---|
cuda_vec_add | cudaLaunchKernel |
cuda_cooperative_group | cudaLaunchCooperativeKernel |
cu_vec_add | cuLaunchKernel |
cu_multi_entries | cuLaunchKernel for difference kernels with the same calling context |
cu_cooperative_group | cuLaunchCooperativeKernel (ERROR) |
target_vec_add | omp target |
Case | Purpose |
---|---|
cu_call_path | acyclic call graph |
cu_call_path_recursive | recursive device function calls |
cu_call_path_recursive_mutual | mutual recursive device function calls |
cuda_call_path_dynamic | dynamic parallelism |
cuda_call_path_dynamic_recursive | recursive call with dynamic parallelism |
Case | Purpose |
---|---|
nvdisasm | nvdisasm correctness check samples |
cuobjdump | cuobjdump correctness check samples |
cupti_test | cupti_test correctness check samples |
Case | Purpose | URL |
---|---|---|
target_lulesh | omp target performance | https://computation.llnl.gov/projects/co-design/lulesh |
RAJAPerf | cuda and raja perforance | https://github.com/LLNL/RAJAPerf |
sw4 | realworld application with complex call trees | https://github.com/geodynamics/sw4 |