This is a simple example of CUDA code in C/C++ that demonstrates how to use CUDA to add two vectors. This code allocates memory on both the CPU (host) and GPU, initializes the input vectors, copies the input vectors to the GPU, performs the vector addition on the GPU using a CUDA kernel, copies the result back to the CPU, and verifies that the output is correct.
- Compile the code with NVCC:
nvcc vector_add.cu -o vector_add
- This command compiles the CUDA code into an executable named
vector_add
.
Run the program:
./vector_add
This command executes the compiled program vector_add
. The program should print the message "Vector addition completed successfully." if the vector addition was performed correctly.
The code consists of a single C++ file that defines a CUDA kernel and the main program. Here is an overview of what the code does:
-
Defines a CUDA kernel called
vector_add
that adds two vectors and stores the result in a third vector -
Defines the main function, which allocates memory on both the CPU (host) and GPU, initializes the input vectors, copies the input vectors to the GPU, calls the
vector_add
kernel, copies the output vector back to the CPU, and verifies that the output is correct -
Allocates memory on the CPU using new
-
Allocates memory on the GPU using cudaMalloc
-
Copies the input vectors from the CPU to the GPU using cudaMemcpy
-
Defines the number of threads and blocks to use in the
vector_add
kernel -
Calls the
vector_add
kernel using<<<...>>>
notation -
Copies the output vector from the GPU to the CPU using cudaMemcpy
-
Verifies that the output vector is correct by comparing it to the expected output
-
Frees the memory allocated on both the CPU and GPU
To compile and run the code, you will need:
-
C/C++ compiler, such as GCC or Clang.
-
CUDA toolkit, which includes the CUDA library and the NVIDIA CUDA compiler (NVCC).
-
NVIDIA GPU with CUDA support.