nn-fault-tolerance: A Python repository from Distributed Computing Laboratory, EPFL

Probabilistic Fault Tolerance of Neural Networks in the Continuous Limit

Code for the paper "Probabilistic Fault Tolerance of Neural Networks in the Continuous Limit"

Installation

To run a Jupyter notebook:

Install miniconda https://conda.io/miniconda.html for Python 3.7.3
Install requirements from environment.yml
Activate Freeze nbextension to skip the computation cells automatically and just plot data from pickled results
You can turn on cells which re-run the experiments, but do not enable configuration cells which will just screw the initial conditions

Most results are pickled so that the figures can be generated without running the computation again

Tested on a 12CPU machine with 2xGPU NVIDIA GTX 1080 running Ubuntu 16.04.5 LTS

Code description

Code is written in Python 3 with Keras/TensorFlow and is documented.

Classes

model.py the definition of a fully-connected network model with crashes using Keras
experiment.py is the main class Experiment which computes error experimentally or theoretically
bounds.py implements bounds b1, b2, b3, b4 (see supplementary) and its methods are added to the Experiment class.
experiment_random.py provides random initialization for an Experiment, experiment_train.py instead trains a network with data. experiment_datasets.py runs a TrainExperiment for specific datasets.
process_data.py implements functions to plot the experimental results
helpers.py various small functions used in the project
derivative_decay.py implements routines for the experiments of derivative decay rate test
continuity.py implements the smooth() functions from Eq. (1) from the paper
model_conv.py contains various helpers for convolutional models (replacing activations with smooth ones, ...)
experiment_model.py wraps around a Sequential Keras model and adds faults to all layers

Notebooks for the main paper

ComparisonIncreasingDropoutMNIST.ipynb compares networks trained with different dropout on MNIST using bounds
Regularization.ipynb regularizes networks with b3 variance bound to achieve fault tolerance
ConvNetTest-MNIST.ipynb trains a small convolutional network and verifies bound on it
ConvNetTest-VGG16.ipynb loads the pre-trained VGG model and verifies the bound on it
ConvNetTest-ft.ipynb compares fault tolerance of pre-trained CNN models
FaultTolerance-Continuity-FC-MNIST.ipynb shows a decay of VarDelta when n increases when using our regularizer (Eq. 1 main paper)
TheAlgorithm.ipynb tests Algorithm 1 on a small convnet for MNIST

Notebooks from the supplementary

FilterPlayground.ipynb allows to tune the smooth() coefficients online
WeightDecay-FC-MNIST.ipynb shows that weights do not decay as we expect without regularization
DerivativeDecay-FC-MNIST.ipynb shows that derivatives do not decay without regularization as we expect
WeightDecay-Continuity-FC-MNIST.ipynb shows that with regularization, continuity holds (derivatives decay, weights stabilize)
ConvNetTest-VGG16-ManyImages.ipynb investigates into filter size and how well b3 works in CNNs and tries to apply pooling on input for VGG (uncomment a line to download images first)
ErrorAdditivityRandom.ipynb is the test of error additivity on Boston dataset
ErrorComparisonBoston.ipynb compares Boston-trained networks
ConvNetTest-MNIST.ipynb, ConvNetTest-VGG16.ipynb test the b3 bound on larger networks
ErrorOnTraining.ipynb tests the prediction of AP9 in the supplementary

Additional

Riemann.ipynb generates the continuity figure

Unused

bad_input_search.py is a genetic algorithm for searching for worst fault tolerance input. tests.py provides internal tests for the functions. tfshow.py shows a TensorFlow graph in a Jupyter Notebook. onepixel and AE_*.ipynb investigates into adversarial robustness of networks regularized with b3.

LPD-EPFL/nn-fault-tolerance