This project implements a fully connected deep neural network in C++ to solve the Fashion MNIST dataset.
After training for a user-defined number of epochs, evaluates the model on both
the training and testing datasets to output the predictions to
train_predictions.csv
and test_predictions.csv
.
With the current hyperparameters, we achieve 88.04% accuracy on the testing set after around 2 minutes of CPU time.
Expected location:
data/fashion_mnist_test_labels.csv
data/fashion_mnist_test_vectors.csv
data/fashion_mnist_train_labels.csv
data/fashion_mnist_train_vectors.csv
Simply run make run
to compile and run the neural network.
Compile the source, eg. using make
, to generate network
executable.
Then, the usage is:
./network -e [NUM_EPOCHS] -l [LEARNING_RATE] -b [BATCH_SIZE] INPUT_NEURONS_AMOUNT HIDDEN_LAYER_1_NEURONS_AMOUNT [...] OUTPUT_NEURONS_AMOUNT
Validation set, 20% of the training set, is used to calculate the accuracy after each batch (batching is implemented).
For weight initialization, He weight init is used. ReLU activation function is used for hidden layers, and softmax for the output layer. The categorical cross entropy was chosen for the loss function. The network uses SGD with momentum and RMSProp. Dropout is implemented but turned off as it doesn't seem necessary.
Code architecture inspired by https://www.youtube.com/watch?v=sK9AbJ4P8ao