/syncognite

Neural network library inspired by Stanford's 2016 CS231n course, written in C++

Primary LanguageC++MIT LicenseMIT

syncognite - A neural network library inspired by Stanford's CS231n course

Dev Docs Cmake License

A neural network library for convolutional, fully connected nets and RNNs in C++.

This library implements some of the assignments from Stanfords's CS231n 2016 course by Andrej Karpathy, Fei-Fei Li, Justin Johnson and CS224d by Richard Socher as C++ framework.

The current v2-version of the project has the following objectives:

  • implement full support for graphs (not only sequential)
  • cleanup & documentation
  • This will be work-in-progress for considerable time. The previous version is archived in branch v1.
  • CUDA support and other external graphics card libs have been removed (since for good performance they need to rely on blackbox-libs)

Current state: beta

Features

  • Fully connected networks
  • Convolutional layers
  • Recurrent nets (RNNs)
  • Long-term short-term memory nets (LSTMs)
  • ReLu, Sigmoid, TanH, SELU(1), resilu(2) nonlinearities
  • BatchNorm, SpatialBatchNorm, Dropout layers
  • Softmax, SVM loss
  • TemporalAffine and TemporalSoftmax layers for RNNs

[1]: "scaled exponential linear units" (SELUs), https://arxiv.org/abs/1706.02515

[2]: "resilu residual & relu nonlinearity + linearity" (linear skip connection combined with non-linearity) (s.b.)

Sample

Model

Example: C++ definition of a deep convolutional net with batch-norm, dropout and fully connected layers:

LayerBlock lb(R"({"name":"DomsNet","bench":false,"init":"orthonormal"})"_json);

lb.addLayer("Convolution", "cv1", R"({"inputShape":[1,28,28],"kernel":[48,5,5],"stride":1,"pad":2})",{"input"});
lb.addLayer("BatchNorm","sb1","{}",{"cv1"});
lb.addLayer("Relu","rl1","{}",{"sb1"});
lb.addLayer("Dropout","doc1",R"({"drop":0.8})",{"rl1"});
lb.addLayer("Convolution", "cv2", R"({"kernel":[48,3,3],"stride":1,"pad":1})",{"doc1"});
lb.addLayer("Relu","rl2","{}",{"cv2"});
lb.addLayer("Convolution", "cv3", R"({"kernel":[64,3,3],"stride":2,"pad":1})",{"rl2"});
lb.addLayer("BatchNorm","sb2","{}",{"cv3"});
lb.addLayer("Relu","rl3","{}",{"sb2"});
lb.addLayer("Dropout","doc2",R"({"drop":0.8})",{"rl3"});
lb.addLayer("Convolution", "cv4", R"({"kernel":[64,3,3],"stride":1,"pad":1})",{"doc2"});
lb.addLayer("Relu","rl4","{}",{"cv4"});
lb.addLayer("Convolution", "cv5", R"({"kernel":[128,3,3],"stride":2,"pad":1})",{"rl4"});
lb.addLayer("BatchNorm","sb3","{}",{"cv5"});
lb.addLayer("Relu","rl5","{}",{"sb3"});
lb.addLayer("Dropout","doc3",R"({"drop":0.8})",{"rl5"});
lb.addLayer("Convolution", "cv6", R"({"kernel":[128,3,3],"stride":1,"pad":1})",{"doc3"});
lb.addLayer("Relu","rl6","{}",{"cv6"});

lb.addLayer("Affine","af1",R"({"hidden":1024})",{"rl6"});
lb.addLayer("BatchNorm","bn1","{}",{"af1"});
lb.addLayer("Relu","rla1","{}",{"bn1"});
lb.addLayer("Dropout","do1",R"({"drop":0.7})",{"rla1"});
lb.addLayer("Affine","af2",R"({"hidden":512})",{"do1"});
lb.addLayer("BatchNorm","bn2","{}",{"af2"});
lb.addLayer("Relu","rla2","{}",{"bn2"});
lb.addLayer("Dropout","do2",R"({"drop":0.7})",{"rla2"});
lb.addLayer("Affine","af3",R"({"hidden":10})",{"do2"});
lb.addLayer("Softmax","sm1","{}",{"af3"});

Training

json jo(R"({"verbose":true,"shuffle":true,"lr_decay":0.95,"epsilon":1e-8})"_json);
jo["epochs"]=(floatN)40.0;
jo["batch_size"]=50;
jo["learning_rate"]=(floatN)5e-4;
jo["regularization"]=(floatN)1e-8;

lb.train(X, y, Xv, yv, "Adam", jo);

floatN train_err, val_err, test_err;
train_err=lb.test(X, y, jo.value("batch_size", 50));
val_err=lb.test(Xv, yv, jo.value("batch_size", 50));
test_err=lb.test(Xt, yt, jo.value("batch_size", 50));

cerr << "Final results on MNIST after " << jo.value("epochs",(floatN)0.0) << " epochs:" << endl;
cerr << "      Train-error: " << train_err << " train-acc: " << 1.0-train_err << endl;
cerr << " Validation-error: " << val_err <<   "   val-acc: " << 1.0-val_err << endl;
cerr << "       Test-error: " << test_err <<  "  test-acc: " << 1.0-test_err << endl;

see mnisttest or cifar10test for complete examples.

A model that generates text via LSTMs can be defined with:

json j0;
string oName{"OH0"};
j0["inputShape"]=vector<int>{T};
j0["V"]=VS;
lb.addLayer("OneHot",oName,j0,{"input"});

int layer_depth=4;
string nName;
json j1;
j1["inputShape"]=vector<int>{VS,T};
j1["N"]=BS;
j1["H"]=H;
j1["forgetgateinitones"]=true;
j1["forgetbias"]=1.0;
j1["clip"]=clip;
for (auto l=0; l<layer_depth; l++) {
	nName="lstm"+std::to_string(l);
	lb.addLayer(rnntype,nName,j1,{oName});
	oName=nName;
}

json j11;
j11["inputShape"]=vector<int>{VS,T};
lb.addLayer("TemporalSoftmax","sm1",j11,{"af1"});

see rnnreader for a complete example.

Dependencies:

  • C++ 11 compiler (on Linux (tested: clang, gcc, Intel icpc) or macOS (clang x86-64 and Apple silicon (clang 12, 13)), Raspberry ARM(gcc))
  • CMake build system.
  • Hdf5 C++ API for model saving and sample data, hdf5 or libhdf5-dev.

Apple silicon notes

  • use ccmake to configure USE_SYSTEM_BLAS to ON, which instructs eigen to use M1's hardware accelerators. rnnreader sees dramatic 3x-6x speedup, single thread benchmarks in bench see 200%-400% improvements! [Testet on macOS 12 beta 3 - 2021-07-19]
  • Memory: macOS simply doesn't give processes all available memory. Expect swapping (and significant speed decrease) when allocating more than 4-5GB, even on 16GB M1 machines.
  • The hdf5 libraries are available for ARM64 (brew install hdf5).

External libraries that are included in the source tree:

  • Eigen v3.4 eigen3, already (in default configuration) included in the source tree as submodule.
  • nlohmann_json, already included in source tree (cpneural/nlohmann_json).

Build

syncognite uses the CMake build system.

Clone the repository:

git clone git://github.com/domschl/syncognite
git submodule init
git submodule update    # This gets the in-tree Eigen3

Create a build directory within the syncognite directory and configure the build:

# in sycognite/build, default is make-build-system, but Ninja can also be used:
cmake [-G Ninja] ..
# optionally use ccmake to configure options and paths:
ccmake ..

To configure your editor / ide for include paths use (in build):

cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=YES ..

or simply execute the helper create_compile_commands.sh.

macOS users might want to configure for building with Xcode:

cmake -G Xcode ..

Build the project:

make
# or
ninja
# or (macOS) start Xcode and load the generated project file, or:
xcodebuild -configuration Release

History

  • 2022-04-01: nlohmann_json updated to latest
  • 2022-03-24: Serious bug fixed in stateful optimizers (incl. Adam): state was lost on each call, causing slow convergence.
  • 2022-03-22: Started v2-branch Removed CUDA and other external graphics libs.
  • 2021-10-10: Moved CI from travis (defunct) to github workflows. Valgrind currently disabled.
  • 2021-08-21: eigen update to 3.4 release
  • 2021-07-19: eigen update to 3.4rc1
  • 2021-07-19: Dramatic speed improvements when configuring eigen to use system blas (using ccmake) with Apple M1, seems to use M1's magic hardware accelerators.
  • 2020-11-12: Switched eigen3 submodule to gitlab, tracks 3.3 branch
  • 2020-07-31: Apple ARM tested ok.
  • 2020-07-05: Tests with resilu (non-)linearity
  • 2018-03-02: Removed faulty RAN layer, switched to official eigen3 github-mirror at: Github eigen3, fixes for eigen-dev stricted type-checking.

Subprojects:

Things that should work:

  • testneural (cptest subproject, consistency tests for all layers using testdata and numerical differentials)
  • bench (benchmark subproject, benchmarks for all layers)
  • mnisttest (cpmnist subproject, MNIST handwritten digit recognition with a convolutional network, requires dataset download.)
  • cifar10test (cpcifar10 subproject, cifar10 image recognition with a convolutional network, requires dataset download.)
  • rnnreader (rnnreader subproject, text generation via RNN/LSTMs, similar to char-rnn.)

Appendix

Resilu (non-) linearity

See jupyter notebook for visualization and more discussions of resilu function.

(1) $\quad rsi(x)=\frac{x}{1-e^{-x}}$

$rsi(x)$ can be rewritten as:

(2) $\quad rsi(x)=\frac{x}{e^{x}-1}+x$

thus can be interpreted as a residual combination of linearity and non-linearity via addition.

Since $rsi(x)$ shows a phase-transition instability at $x=0$, a taylor $O(4)$ approximation is used for $rsi(x)$ and $\nabla rsi(x)$ for $-h\lt 0\lt h$.

Both $e^x$ quotients (1) and (2) have as limit $ReLU(x)$ or, in case of (2): $-ReLU(x)$, if $e^x$ is replaced by $e^{\frac{x}{a}}$ for small constants $a$.