/CUDAmop

Common CUDA-based matrix opeartors and corresponding Nsight System profiling programs

Primary LanguageCudaGNU General Public License v3.0GPL-3.0

CUDAmop (CUDA Matrix Operators and Profiling)

This repo provides implementations of common CUDA matrix opeartors and corresponding profiling-program suite, including:

Vector Addition
Version Operator Profiling Program
Utilize manually-manaed memory src/vector_addition.cu
vectorAdd
profiling/vector_addition/basic.cu
Utilize unified memory-based interfaces src/vector_addition.cu
vectorAdd
profiling/vector_addition/unified.cu
Utilize unified memory-based interfaces with prefetching and memory hint src/vector_addition.cu
vectorAdd
profiling/vector_addition/unified_prefetch.cu
cuBLAS cublasSaxpy_v2 profiling/cublas/vector_add.cu
Squared Matrix Multiplication
Version Operator Profiling Program
Naive version of squared matrix multiplication src/matrix_mul.cu
squareMatrixMul
profiling/matrix_multiplication/basic.cu
Align memory access pattern through matrix transposing src/matrix_mul.cu
alignedSquareMatrixMul
profiling/matrix_multiplication/aligned.cu
Utilize scratchpad memory for tiled matrix multiplication src/matrix_mul.cu
tiledSquareMatrixMul
profiling/matrix_multiplication/tiled.cu
cuBLAS cublasSgemm_v2 profiling/cublas/matrix_multiplication.cu
Sum Reduction
Version Operator Profiling Program
Naive implementation src/sum_reduction.cu
sumReduction
profiling/sum_reduction/basic.cu
Implementation without warp divergence src/sum_reduction.cu
nonDivergenceSumReduction
profiling/sum_reduction/non_divergence.cu

I also wrote corresponding blogs (in Chinese) for the underhood details behind these profiling test (available here), welcome to read and comments if you have any suggestion.

Build Project

Preparation

  1. Host equipped with NVIDIA CUDA-capable GPU, see CUDA GPUs - NVIDIA Developer;
  2. OS with NVIDIA Driver and CUDA Tookit installed, to check:
# check driver status
$ nvidia-smi
Wed Jul 27 13:54:53 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.129.06   Driver Version: 470.129.06   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
# Your GPU Info ...

# check cuda status
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_May__3_18:49:52_PDT_2022
Cuda compilation tools, release 11.7, V11.7.64
Build cuda_11.7.r11.7/compiler.31294372_0
  1. OS with build essential tools installed
# Ubuntu
sudo apt-get install build-essential

# CentOS
sudo yum install \
        autoconf automake binutils \
        bison flex gcc gcc-c++ gettext \
        libtool make patch pkgconfig \
        redhat-rpm-config rpm-build rpm-sign \
        ctags elfutils indent patchutils 

Build Project

use cmake to create Makefile for operators and profiling program:

# create subdirectory named "build"
mkdir build

# run cmake under [path to root]/build
cd build
cmake ..

directory named bin and lib would be automatically created under root path, then run Makefile to construct final executable.

# run make under [path to root]/build
make

then profiling executables can be obtained under [path to root]/bin, operator library can be obtained under [path to root]/lib

Profiling

Preparation

Both on the server and client side, with NVIDIA Nsight System installed, see NVIDIA Nsight Systems.

Usage

cd [path to root]
nsys profile --force-overwrite true -o result/[target name] [path to root]/bin/[target name]

Profiling files under result are tested using NVIDIA A4000