EMNIST Neural network

This repo contains the functions to train a neural network with the emnist dataset on top of OpenFaaS.

Getting started

Prerequisites

These functions take advantage of the persistent state of lambdas when OpenFaaS is deployed with faas-state. Check on the repo for instructions on how to install.

faas-cli must be installed in the developer machine to build, push and deploy the function. The OPENFAAS_URL environment variable must be set to the respective OpenFAAS cluster. Docker must be running on the developer machine.

Access to the OpenFAAS cluster must also be in place in order to read the results.

Installing

On the project home folder, run the following command:

$ faas-cli build -f emnist.yml --parallel=2 && faas-cli push -f emnist.yml --parallel=2 && faas-cli deploy -f emnist.yml

After this step you should be able to invoke it by executing:

$ echo | faas-cli invoke emnist-setup

Running the tests

The following results are obtained in a 3 machine Kubernetes cluster with Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz and 16GB of RAM per server. The following commands are executed on the k8s master.

To scale the deployment to the number of desired machines, execute:

$ kubectl scale deployment emnist-train --replicas=3 --namespace=openfaas-fn

To see the names of the pods executing the function, run:

$ kubectl get pods --namespace=openfaas-fn -o wide | grep "emnist-train"

To see the results of the training in the logs, run:

$ for server in $(kubectl get pods --namespace=openfaas-fn -o wide | grep "emnist-train" | cut -d' ' -f1); do kubectl logs --namespace=openfaas-fn $server emnist-train; done

The full stack of tests is available in the tests folder and can be run from with the test.sh script. To include a test on the current execution of test.sh the corresponding file from tests should be copied into the home file. Diferent files have different values of batch_size and number_of_pods. After the required tests are in the project folder, they can be run using:

$ nohup test.sh &

The outputs are saved in an output.txt file.

DownpoutSGD_global_client_nFetch10, emnist321.yml

Round	Time (s)	Iterations	42	41	40
1	1415	5			X
2	209	5	X
3	770	5		X
4	768	5		X
5	210	5	X

Avg	674	5
Stdev	499	0

DownpoutSGD_global_client_nFetch10, emnist323.yml

Round	Time (s)	Iterations	42	41	40
1	393	3		X
	394	3		X
	352	16	X
2	620	11			1
	595	10	X
	597	9	X
3	1914	11		X
	1963	11		X
	1964	13		Y
4	415	3		X
	415	3		X
	363	16	X
5	355	17	X
	473	4		X
	474	4		X

Avg	752	9
Stdev	624	5

The full set of results is available in the test folder.

Branches

The branch naming follows the convention name algotithmName_databaseType_args_type, where algorithm might be DownpourSDG, databaseType global or local (with synchronization in the background) and args can be the nFetch or nPull parameters. The types are the following:

Single function training means that a single function is responsible for loading the data, initialising and training the network.

The remaining need 2 different functions deployed on OpenFaaS. A setup function that initialises the network and the state, and then calls a train function, that trains the network with the data.

One function per iteration means that each iteration of SGD is run in a separate function. If the training takes 50 iterations, at the end of each iteration the train function requests the gateway to schedule the next invocation of the function. If the data is divided into M parts, each of the parts executes one function per iteration.
Setup + loop in one function means that the entire training (all iterations) are run in a single function. The setup function initialises the network and then calls M train functions depending on the number of machines in the cluster.

Type 3 has the added benefits of natural scalability and fault tolerance.

edujanicas/emnist