Multi-Modal Multi Environment Performance Comparison on Google Cloud Platform | Project - NYU Cloud and Machine Learning

In recent years, due to the enormous amount of data, and the advancements in Big Data and Machine Learning Applications, Cloud Computing has become ubiquitous. Most of the applications in today’s world are deployed to the cloud so that they can easily be scaled and accessed by the end-users. Along with deploying the applications in the bare Virtual Machines, the advent of containerized Software has given the developers a hard time to choose the environment where they should deploy their applications. There are different kinds of applications and Machine Learning workloads and it becomes very important to choose the best environment where the application should be deployed for maximum resource utilization.

We have evaluated 3 environments on GCP for 2 different workloads: CNN and RNN based models

VM
Docker
Singularity

The files above have been organized in the following manner: cnn

Code and profiling for MNIST digit recognition.

cnn.codeScripts

Contains the main python file for training
contains two scripts for nvprof and nsys profiling

cnn.outputs

Contains two folders bare and docker containing output metrics for the VM and docker profiling respectively.
Contains the file Dockerfile for building image to be used to spin up Docker container.
folder time contains the real, user and sys time for the different batch-sizes.
file named kernels_vm_docker.txt contains the different kernels observed for the same code for batch-size 64.

rnn

Code and profiling for sentiment analysis.

rnn.codeScripts

Contains the main python file for training
contains two scripts for nvprof and nsys profiling

rnn.inputData

Contains the custom subset input for the profiling.

rnn.outputs

Contains two folders bare and docker containing output metrics for the VM and docker profiling respectively.
file named time_command_outputs.txt contains the real, user and sys time for the different batch-sizes.
file named kernels_vm_docker.txt contains the different kernels observed for the same code for batch-size 64.

Steps below can be followed for CNN metrics profiling for comparison on the 3 environments, assuming docker and singularity has been installed in the VM.

VM

Copy the 3 files from the folder codeScripts in the VM
Default input data-size has been set 512, can be changed in line-number:120
To run nvprof profiling for batch-size 64 run

sh nvprof.sh 64

To run nsys profiling for batch-size 64 runtime run

sh nsys.sh 64

Outputs would be generated in the folder name $BATCH_SIZE, here 64

Docker

The Docker container image can be generated from the given Dockerfile in the folder codeScripts
Spin up the docker container using the command

docker run -it --gpus all --privileged -v /usr/local/cuda:/usr/local/cuda mnist bash

Default input data-size has been set 512, can be changed in line-number:120
To run nvprof profiling for batch-size 64 run

sh nvprof.sh 64

To run nsys profiling for batch-size 64 runtime run

sh nsys.sh 64

Outputs would be generated in the folder name $BATCH_SIZE, here 64

Singularity

The Singularity image can be generated using:

sudo singularity pull mnist.sif docker://pytorch/pytorch

Next run the singularity container with the command

sudo singularity shell --bind /usr/local/cuda --nv mnist.sif bash

Default input data-size has been set 512, can be changed in line-number:120
To run nvprof profiling for batch-size 64 run

sh nvprof.sh 64

To run nsys profiling for batch-size 64 runtime run

sh nsys.sh 64

Outputs would be generated in the folder name $BATCH_SIZE, here 64

Running the Application

Setup & Installtion

Make sure you have the latest version of Python installed.

git clone <repo-url>

pip install -r requirements.txt

Running The App

python main.py

Viewing The App

Go to http://127.0.0.1:8000

Aman-Chopra/Multi-Modal-Multi-Environment-Performance-Comparison-on-GCP