VideoOneNet: Bidirectional Convolutional Recurrent OneNet with Trainable Data Steps for Video Processing

Python source code for reproducing the experiments described in the paper

Paper (.pdf)

Code is mostly self-explanatory via file, variable and function names; but more complex lines are commented.
Designed to require minimal setup overhead, using as much Keras and sacred integration and reusability as possible.

Installing dependencies

Installing Python 3.7.9 on Ubuntu 20.04.2 LTS:

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install python3.7

Installing CUDA 10.0:

wget https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux
sudo bash cuda_10.0.130_410.48_linux --override
echo 'export PATH=/usr/local/cuda-10.0/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

Installing cuDNN 7.6.5:

wget http://people.cs.uchicago.edu/~kauffman/nvidia/cudnn/cudnn-10.0-linux-x64-v7.6.5.32.tgz
# if link is broken, login and download from nvidia:
# https://developer.nvidia.com/compute/machine-learning/cudnn/secure/7.6.5.32/Production/10.0_20191031/cudnn-10.0-linux-x64-v7.6.5.32.tgz
tar -xvzf cudnn-10.0-linux-x64-v7.6.5.32.tgz
sudo cp -r cuda/include/* /usr/local/cuda-10.0/include/
sudo cp -r cuda/lib64/* /usr/local/cuda-10.0/lib64/

Installing Python packages with pip:

python3.7 -m pip install h5py==2.10.0 ipython==7.16.1 keras==2.2.4 matplotlib==3.3.2 numpy==1.19.2 pillow==8.1.0 pywavelets==1.1.1 sacred==0.8.2 scikit-learn==0.23.2 scipy==1.5.2 tensorflow-gpu==1.14.0 tqdm==4.56.0

Running the code

Reproduction should be as easy as executing this in the root folder (after installing all dependencies):

python3.7 -m IPython experiments/mnistrotated.py with videoonenet seed=123

In general:

python3.7 -m IPython experiments/dataset.py with algorithm optional_config seed=number

where dataset is either:

mnistrotated : the Rotated MNIST video set, artificially generated by rotating and picking the top left corner,
cifar10scanned : the Scanned CIFAR-10 video set, artificially generated by sliding a window,
ucf101 : the UCF-101 natural video set;

algorithm is either:

videoonenet : our videoonenet method with 2 contributions,
videoonenetadmm : the original onenet baseline method,
videowaveletsparsityadmm : the wavelet sparsity baseline method;

and optional_config is either nothing (convolutional minimal recurrent layer enabled by default), or:

rnn : convolutional minimal recurrent layer enabled,
nornn : convolutional minimal recurrent layer disabled.

seed : 123 in all of our experiments, should yield very similar numbers as in the table of our paper

Directory and file structure:

algorithms/
                 kerasvideoonenet.py : base class, our videoonenet method with 2 contributions
                 kerasvideoonenet_admm.py : subclass, the original onenet baseline method
                 numpyvideowaveletsparsity_admm.py : subclass, the wavelet sparsity baseline method (CPU-only)
datasets/
              mnistrotated.py : base class, loads Rotated MNIST data set and generates linear inverse problems on the fly
              cifar10scanned.py : subclass, same but for Scanned CIFAR-10
              ucf101.py : subclass, same but for UCF-101
experiments/
                    mnistrotated.py : config file for hyperparameters, loads Rotated MNIST data set and an algorithm,
                                                conducts experiment; requires ~8Gb GPUs for training
                    cifar10scanned.py : same, but for Scanned CIFAR-10; requires ~12Gb GPUs for training
                    ucf101.py : same, but for UCF-101, requires 2*24Gb GPUs for training
results/ : experimental results will be saved to this directory with sacred package
utils/
       layers.py : custom Keras layer classes, including
                       ConvMinimalRNN2D : the convolutional minimal recurrent layer
                       InstanceNormalization : the instance normalization layer
                       UnrolledOptimization : the layer responsible for end-to-end trainable ADMM iterations, the core of
                                                                 our algorithms
       ops.py : custom Keras/Tensorflow operations, including
                    elu_like : the activation function
                    batch_pinv : batched Moore-Penrose pseudoinverse computation
       pil.py : functions for backwards compatibility for saving all kinds of figures
       plot.py : functions for saving video frame figures
       problems.py : functions for the linear inverse problem set, each generating the matrix A^{(n)} and an auxiliary
                              shape for saving figures
       utils.py : additional things, including
                    LinearInverseVideoSequence : Keras Sequence subclass generating random videos and linear inverse
                                                                         problems from the given problem set

Citation:

@inproceedings{milacski2020videoonenet,
  title={VideoOneNet: Bidirectional Convolutional Recurrent OneNet with Trainable Data Steps for Video Processing},
  author={Milacski, Zolt{\'a}n and Poczos, Barnabas and Lorincz, Andras},
  booktitle={International Conference on Machine Learning},
  pages={6893--6904},
  year={2020},
  organization={PMLR}
}

Contact:

In case of any questions, feel free to create an issue here on GitHub, or mail me at srph25@gmail.com.

srph25/videoonenet