Warning: This is my playground project to implement tensor ops, neural networks, gradient computations etc. Code structure and assumptions made may be restrictive. CUDA Kernel implementations are not optimized for performances and may not work in all cases.
- NNCpp library
- Tensor (CPU/CUDA) on Unified Memory
- Layers: activations, convolution on CUDA only
- Model: list of layers
- Criterion: CrossEntropy, MSE
- Example applications
- Tests
- g++
- cuda 10.1, nvcc
- gtests:
apt-get install -y libgtest-dev
cd /usr/src/gtest && cmake . && make && mv libg* /usr/lib/
mkdir build && cd $_
cmake .. && make
-
Tensor : "tensor.hpp"
- All tensors are 4D.
- Data allocated in "Unified Memory" for all devices: CPU/CUDA.
#include <iostream>
#include "tensor.hpp"
auto t_cpu = nncpp::Tensor::zeros(n, c, h, w, Device::CPU);
auto t_cuda = nncpp::Tensor::ones(n, c, h, w, Device::CUDA);
std::cout << t_cpu << std::endl;
std::cout << t_cuda << std::endl;
t_cpu.at(0, 0, 0, 0) = 1.0f;
t_cuda.at(0, 0, 0, 0) = 2.0f;
- NN operations: "activations.hpp",
cd build
cmake .. & make && ./test-nncpp
cd build
nvprof ./test-nncpp