Aceleron

A beautifully simple, fully parallelized, neural network framework written in C++ and Python!

This framework covers the core mechanics of traditional neural network techniques and it's approach from a high performance computing perspective.

Features:

Layers:
- Dense
Activations:
- SoftMax
- ReLU
Losses:
- Categorical Cross Entropy
- Sparse Categorical Crossentropy
Optimizers:
- SGD

Usage

// Setup some training data and labels
Matrix<double> x_train(train_size, 2);
Matrix<double> y_train(train_size, 2);

// Define the network
Dense<double> layer1(2, 16);
ReLU<double> layer2;
Dense<double> layer3(16, 2);
SoftmaxCrossEntropy<double> layer4;
optimizer::SGD<double> sgd(1.0, 0.001);

// Complete forward pass
Matrix<double> out1 = layer1.forward(x_train);
Matrix<double> out2 = layer2.forward(out1);
Matrix<double> out3 = layer3.forward(out2);
Matrix<double> out4 = layer4.forward(out3, y_train);

// Calculate loss and metric
double loss = layer4.get_loss();
double acc = metric::accuracy(y_train, out4);

Installation

C++

Requires CUDA version 10.0, C++11 or greater and appropriate compiler (g++)

Clone repo
Cd to C++ dir
Run "make" in terminal
Run "./driver 1000" for demonstration of the serial implementation
Run "./driverp 1000" for demonstration of the parallel implementation

Python

Clone repo
Install dependacies (numpy)
Run driver.py for demonstration of the serial implementation

HPC Analysis

This codebase is paired with a HPC report on the comparison of the serial and parrallel implementation. If you wish to read it, see this link to download a copy.