This project presents the implementation of the LeNet-5 convolutional neural network using CUDA, the parallel computing platform and API model created by NVIDIA.This implementation showcases the power of GPU-accelerated computing in processing neural networks. Developed primarily for educational purposes, this work forms a part of the practical coursework in "Hardware for Signal Processing", a subject that delves into the hardware aspects of processing signals and data.
LeNet-5, one of the earliest convolutional neural networks, remains a foundational architecture for modern deep learning in image recognition and computer vision. While the network is typically trained in Python due to its simplicity and widespread use in the deep learning community, this project uniquely employs CUDA for the inference phase, leveraging the computational power of NVIDIA GPUs for enhanced performance.
The project serves as a practical example for those looking to transition from Python to CUDA in deep learning. It provides a hands-on opportunity to explore the intricacies of GPU programming and understand how conventional neural networks can be adapted to harness the power of parallel computing.
- CUDA-optimized inference implementation of the LeNet-5 neural network.
- A demonstration of the transition from Python-based training to GPU-accelerated inference.
- Exploration of key deep learning concepts in a hardware-accelerated context.
- An educational tool for understanding the practical deployment of trained neural networks on GPUs.
- Python: Version 3.8 or above
- CUDA Toolkit: Version 11.2
- NVIDIA GPU: With CUDA Compute Capability 6.1 or higher
- Operating System: Tested on Ubuntu 22.04
- Download the MNIST training dataset from Kaggle. Look for the file named
train-images.idx3-ubyte
. - Create a folder in your project directory named
train-images
. - Place the downloaded
train-images.idx3-ubyte
file inside thetrain-images
folder.
- Open the
main.cu
file in a text editor. - Locate the section of the code where the test image index is set (look for a variable or a comment indicating this).
- Modify the image index to select a specific test image from the MNIST dataset. For example, set the image index to
5
to select the sixth image (assuming the indexing starts from0
).
-
Compile the CUDA code by opening a terminal or command prompt in your project directory.
-
Run the following command to compile the project:
nvcc -o main main.cu kernels.cu utils.cu
After compilation, run the program by executing: ```bash
./main
This section provides an overview of the key scripts in the project, detailing their purpose and functionality.
- Purpose: This is the main script that orchestrates the execution of the LeNet-5 inference process.
- Functionality:
- Initializes the CUDA environment.
- Loads the MNIST dataset image specified by the user.
- Calls the necessary CUDA kernels for the inference process, implementing the stack of LeNet5
- Outputs the inference results.
- Purpose: Contains the CUDA kernels used in the inference process.
- Functionality:
- Includes kernels for various operations like convolution, pooling, dense layers and activation functions.
- Optimized for performance on NVIDIA GPUs.
- Purpose: Provides utility functions that support the main inference process.
- Functionality:
- Includes functions for data loading, preprocessing, initialization, visualization and other auxiliary tasks.
tests.cu
: Contains some toy testing examples for verifying that the CUDA kernels are correctly implementedflat_cpu.cu
: some initial scripts for understaing the implementation of Conv2D in CPUmat_mulp.cu
: initial scripts for understanding and testing the implementation of simple operations (matrix addition/multiplication) in CUDA and CPU
-weights\LeNet5.ipynb
: Training of LeNet5and storing of the weights of the model
- Start by reading
main.cu
, followed bykernels.cu
for understanding the core logic.
- All kernel and auxiliary operations have been implemented.
- The LeNet-5 architecture stack is fully implemented.
- Image loading and visualization for inference are implemented.
- The current performance is not as expected; further debugging is required. We believe the main issue right now is related to loading the weights. Despite thorough debugging of this step, more detailed and careful attention is necessary in how the weigths are stored and loaded. Additionally, properly testing the conv3d kernel with a generalized test might be needed.
The network is able to perform the operations for inference with the loaded weigths from the trained model, however the outputs are most of the time not coherent with what is expected.
Preview of some of the outputs with the current network: