The aim of this project is to implement a basic but flexible API in modern C++
to build and train sequential neural networks1, i.e. neural networks whose hidden layers have exactly one input and one output layer. It is my submission for the Capstone project2 in the Udacity C++ Nanodegree Program. To showcase the API, we train a neural network with the MNIST dataset (handwritten digits between 0 and 9, e.g. see this link). Unfortunately, the performance is very poor at this point3. However, the main point of this project is not to create a competitor of the well established neural network APIs (e.g. tensorflow
or keras
). Instead the purpose is to apply techniques from (modern) C++
like smart pointers, move semantics, templates, libraries and abstract classes. The latter allows the user to easily add new types of Layers etc.
At the beginning, I tried to do everything from scratch but then quickly realized that would be too much. For example, it would have required the implementation of a fully functioning matrix class (with matrix multiplication etc.). Hence the only external dependency is the excellent Eigen library which deals with the necessary matrix calculus. It is contained in this repository as a [git submodule][https://git-scm.com/book/en/v2/Git-Tools-Submodules].
We provide two types of layers, a LinearLayer
and an ActivationLayer
. They just encapsulate an affine-linear transformation and activation function respectively. To instantiate a sequential neural network with just a LinearLayer
followed by an ActivationLayer
with the ReLu activation function (so we create a Percpetron), we use the following code. The includes should be the same as in main.cpp and are omitted for clarity:
int inputSize = 20; // number of columns of the matrix in the LinearLayer
int outputSize = 10; // number of rows of the matrix in the LinearLayer
auto perceptron = SequentialNN({LinearLayer(outputSize, inputSize), ActivationLayer(outputSize, "relu")});
// evaluate on an Eigen matrix with random entries
int batchSize = 4;
Eigen::MatrixXd X = Eigen::MatrixXd::Random(inputSize, batchSize);
std::cout << "Output: \n" << snn(X) << std::endl;
Clearly, we can add as many layers as we want. The only condition they have to satisfy is that the input size of the i-th layer coincides with the output size of the (i-1)-th layer.
NOTE:
- We work with number of rows and columns so that in a
LinearLayer
the output size comes first. Since anActivationLayer
does not change the input size, we only need to give the input size in the constructor. - The weights in the
LinearLayer
are initialized according to the so-called He and Xavier intialization (see e.g. this link).
With the DataParser
(template) class we can load data from a .csv
-file, say train_samples.csv
and train_labels.csv
, which already contain the train data split into data samples and their labels 4. Then we train our perceptron with the mean squarred error (MSE) as loss function, batchSize
and learningRate
via (continuation from above):
DataParser dp;
Eigen::MatrixXd trainSamples=dp.LoadCSV<Eigen::MatrixXd>(train_samples.csv);
Eigen::MatrixXd trainLabels=dp.LoadCSV<Eigen::MatrixXd>(train_labels.csv);
int batchSize=10;
double learningRate=0.001;
auto sdg=SDG("mse", batchSize, learningRate); // mini-batch stochastic gradient descent with MSE as loss function
// actual training
int epochs=1000; // number of epochs in the training
sdg.Train(snn, trainSamples, trainLabels, epochs);
That's it! Now we can apply our snn
to some test data and evaluate it. This is not yet fully implemented since one might have to encode the original data labels (e.g. via one-hot-encoding) first. However, we provide an evaluation in main.cpp for MNIST.
As noted above, the only dependency eigen
is contained as a git submodule. Hence clone this repository with this submodule via
git clone --recurse-submodules https://github.com/flrnbc/Sequential-Neural-Networks
Next we change to the project root directory, initialize cmake
and create a build
directory via
cd Sequential-Neural-Networks
mkdir build/
Now we build main.cpp
via
cd build
cmake ..
cmake --build . --target Main
It is executed via
cd ..
build/Main
(TODO: changing back to the project root is still necessary at the moment but will be fixed.)
The cpp
-files in tests
contain all the (mainly smoke) tests for each class. Please take a look at these files to activate the test functions you are interested in. Then a test is built and run via (after cmake
has already been run as above)
cd build/
cmake --build . --target TestName // where Name in TestName is either DataParser, Function, LayerCache, Layer, LossFunction, Optimizer, SequentialNN or Transformation
cd ..
build/TestName // again replace Name in TestName correspondingly
We recommend trying TestOptimizer
as TestName
first (with the function test_OptimizeLinearRegression1D
not commmented out) since it showcases the training of the simplest neural network possible.
The source code is contained in the src
directory. Typically, each .h/.cpp
file pair declares/defines one class, e.g. the Layer
class is declared/defined in layer.h
and layer.cpp
. All tests are contained in the tests
directory, e.g. tests/test_layer.cpp
contains tests for the Layer
class. Finally, the Eigen
library is in the eigen
directory.
For more information on the classes, please see the corresponding header files.
Transformation
: abstract class withLinearTransformation
andActivationTransformation
as concrete derived classes. The latter uses theFunction
class and both use theEigen::Matrix
class.Layer
: abstract class withLinearLayer
andActivationLayer
as concrete derived classes. Built fromLinearTransformation
andActivationTransformation
respectively as well as theLayerCache
class.SequentialNN
: built from a vectorLayer
classes.Optimizer
: smart pointer toLossFunction
object as member variable.DataParser
: uses theEigen::Matrix
class.
The project needs to fulfill several requirements to successfully pass the Capstone review process. Here we give the positions in our code where the respective requirement is satisfied.
- Demonstrate understanding of C++ functions and control structures:
For example see src/layer.cpp in the function
Layer::Forward()
(line 34 ff.) where conditionals are used to handle an exception. - Reading data from a file and processing it:
This is fulfilled in src/data_parser.h in the function
LoadCSV
(see line 30 ff.) which reads data from a.csv
-file and saes it in anEigen::Matrix
object.
- Using OOP techniques:
The design of the sequential neural network API relies heavily on OOP methods. As an example, see the class
SequentialNN
in src/sequential_nn.h, line 55 ff., which has several member variables and functions. - Class access specifiers for class members:
Used several times, for example in src/transformation.h, lines 49 - 94. There we use
protected
to facilitate the access to member variables for derived classes. - Class constructors utilize member initialization lists: Applied to several constructors (where appropriate), e.g. in src/layer.h, lines 125 - 134.
- Classes follow an appropriate inheritance hierarchy:
Multiple classes inherit from virtual base classes, for example the
LinearTransformation
class fromTransformation
in src/transformation.h, line 101 ff. Moreover, composition is used (albeit via smart pointers) e.g. in theLayer
class, see src/layer.h, line 47 ff. - Derived class functions override virtual base class functions:
This is done, for example, in the function
ZeroDeltaWeights
of theLinearLayer
class, see src/layer.h, line 120. - Templates generalize functions:
See the function
LoadCSV
in src/data_parser.h, line 31.
- Rule of 5:
See src/layer_cache.cpp, line 56 ff., even though we use the default move constructors/assignment operator (which seems to be ok because
LayerCache
is composed of member variables which admit move semantics). - Using move semantics:
Applied in
LayerCache::SetForwardInput
in src/layer_cache.cpp, line 13. - Use of smart pointers:
For example in the
Layer
class, see src/layer.h, line 50 and 53.
Footnotes
-
This terminology might be unconvential but comes from
Sequential
models ofKeras
. ↩ -
The nice thing about Udacity Nanodegrees is that the final (Capstone) projects are entirely up to the student. ↩
-
There might be an issue with vanishing gradients of the softmax activation function. Or even some tricky mistake in the implementation of the backpropagation algorithm. ↩
-
See main.cpp for a simple function which does this splitting in the special case of MNIST. ↩