This is an implementation of a neural net, completely from scratch (including basic tensor operations like matrix multiplication), in CUDA/C++. You can find the accompanying blog for the code here.
The code is by no means efficient, so it is not a practical option and is meant for education purposes only. Nonetheless, here is an overview of the various classes & functions:
Everything is implemented in both pure C++ (under CPU/) and CUDA/C++ (under GPU/). The syntax remains virtually identical, and there are only two points to bear in mind when switching between C++ and CUDA/C++:
- C++ and CUDA/C++ modules end with the suffixes
CPU
andGPU
respectively. - Don't forget to allocate and destroy CUDA arrays via
cudaMallocManaged
andcudaFree
-
linear.h/Linear_SUFFIX
:-
Initiation:
Required arguments:
_bs
(int
, batch size),_n_in
(int
, number of input features),_n_out
(int
, number of output features) Optional argument:_lr
(float
, learning rate) -
forward
: Runs a linear forward pass (weights set with Kaiming initialization, biases set to zero)Required arguments:
_inp
(float*
, the input data),_out
(float*
, holds the output) -
update
: Stores a copy of the weights for later use, then updates them as well as the biases -
backward
: Stores the gradients of the loss with respect to the input in_inp
, assuming_out
contains the gradients of the loss with respect to the next layer's input (i.e. the next layer has calledbackward
). The weights used are the ones saved duringupdate
, and the copies are deleted thereafter.
-
-
relu.h/ReLU_SUFFIX
:-
Initiation:
Required argument:
_sz_out
(int
, the number of elements it's given) -
forward
,backward
: LikeLinear_SUFFIX
but for ReLU
-
-
mse.h/MSE_SUFFIX
:-
Initiation: Like ReLU
-
forward
: Stores the predicted & target values for later useRequired arguments:
_inp
(float*
, the predicted values),_out
(float*
, the target values) -
_forward
: Calculates the MSE but does not store the predicted & target values, meaningforward
must be called regardless of_forward
Required arguments: Like
MSE_SUFFIX
but_out
must have an extra element because the MSE will be saved in_out[sz_out]
-
backward
: Stores the gradients of the target values with respect to the predicted values in_inp
-
-
sequential.h/Sequential_SUFFIX
:-
Initiation:
Required arguments:
layers
(std::vector<Module*>
, layers to be sequenced) -
forward
: Feeds the input to the first layer, the output of that to the second layer, ...Required arguments:
inp
(float*
, the input data),out
, (float*
, there for consistency and doesn't get used. The output is accesible via the last layer'sout
attribute) -
update
: Goes throughlayers
in reverse and calls theirupdate
&backward
methods (first the last layer'supdate
, then itsbackward
, then the second-to-last layer'supdate
, then itsbackward
, ...)
-
-
train_SUFFIX
: Trains a network with gradient descentRequired arguments:
seq
(Sequential_SUFFIX
, the network),inp
(float*
, the input data),targ
(float*
, the target data),bs
(int
, batch size),n_in
(int
, number of input features),n_epochs
, (int
, number of epochs)
For end-to-end training with speed benchmakrs, please run main.cpp
or main.cu
for the CPU and GPU respectively.