/Deep-Learning-Computer-Vision

My assignment solutions for Stanford’s CS231n (CNNs for Visual Recognition) and Michigan’s EECS 498-007/598-005 (Deep Learning for Computer Vision), version 2020.

Primary LanguageJupyter Notebook

Deep Learning for Computer Vision Courses

General info

I present my assignment solutions for both 2020 course offerings: Stanford University CS231n (CNNs for Visual Recognition) and University of Michigan EECS 498-007/598-005 (Deep Learning for Computer Vision).

Review

After reading enormous positive reviews about CS231n, I decided to dive in by myself into the course lectures which, as expected, were great with well-presented and explained topics (thanks to the instructors) that covers a plethora of Machine Learning / Deep Learning concepts (not only computer vision related), theoretically (via lectures, slides and extra reading content) and practically (via the well-designed assignments).

Depending on your ML understanding (especially, statistics, algebra and Python programming with NumPy package) this course can be challenging since the topics are covered from the fundamentals (actually, from scratch, going through all the math behind). Even me, having a mathematical background, I found myself struggled with some advanced topics (like Variational Autoencoders), which pushed me to review some maths/stats formulas. That being said, most of the course materials aren't that difficult and even so, all the putted efforts and spent time are totally worth it, you will learn a lot.

In parallel to CS231n, I took also its Michigan's updated equivalent EECS 498-007 (abbreviated, as there is a lot of numbers in the course's title), because:

  • For CS231n, only 2016 and 2017 lectures are available, which is a little bit old given the fast progress in ML in general. However, this concerns only some topics and even that, the old lectures are still worthy to watch.

  • For EECS 498-007, the 2019 lectures are available. They cover more topics (like Attention, 3D, Video, etc.), and the ones existing in CS231n are updated, and some topics are explained in more detail (like Object detection and VAEs). This novelty concerns also the assignments.

The similarity between the two courses is related to the fact that one of CS231n's main instructors (precisely, Justin Johnson) moved from Stanford to Michigan in 2019.

Assignments

Assignments are the funniest part of the courses, they allow practicing most of the learned theoretical concepts. That is, you will implement vectorized mathematical formulas, gradient descent (be prepared to spend some hours with a pen and a sheet figuring out how to compute formula gradients), neural networks (among others: CNNs and RNNs) from scratch, etc. . That being said, in advanced assignment parts, you will also use high-level frameworks: TensorFlow and PyTorch.

Assignment questions are in form of Jupyter notebooks that call external Python files in order to execute properly. That is, you will mostly implement missing parts in the Python files and execute notebook's cells to check the correctness of your implementation. However, you'll write also some code in the notebooks and respond to inline questions (result analysis and theoretical questions).

For my implementation, I solved all from the three CS231n assignments, for the questions that use frameworks, they ask to pick only one, and for that I choosed PyTorch. That is, questions that require framework were implemented with PyTorch (and not with TensorFlow). For EECS 498-007, since its assignments are similar to the CS231n ones, I solved only those who bring new concepts, precisely A4 (partially, the first two questions about Residual Networks and Attention LSTM), A5 (Object detection: YOLO and Faster RCNN) and A6 (partially, the 1st question about VAEs). For EECS 498-007, there is no choice, only PyTorch is used (which fits perfectly with my choice of using it also in CS231n).

Note that, even that my coding solutions are probably correct, the CS231n assignments contain inline questions for which I'm not sure about their correctness, I just responded as well as I know. Also, Except for the CS231n first assignment (which is less commented), for the remaining assignments, I tried to comment on my code as richly as I can to make it understandable.

Repository Structure

The repository file's structure is quite intuitive, there are two folders (one for each course), each one with its sub-folders that represent the assignments (three for both, CS231n and EECS 498-007). Note that for each assignment's folder, I put a README which shows covered topics and question descriptions (copied from the assignment's website).

In the rest of this README, I will present a quick access to the assignments files, useful links, some obtained results and credits.

Courses' Materials Links

The table below shows relevant links to both courses' materials.

Relevent info CS231n EECS 498-007
Official website [2020], [2017] [2020], [2019]
Lectures playlist [2017], [2016] [2019]
Syllabus [2020], [2017] [2020], [2019]

Assignment Files

CS231n: Convolutional Neural Networks for Visual Recognition

Assignment 1

Modified Python files: k_nearest_neighbor.py, linear_classifier.py, linear_svm.py, softmax.py, neural_net.py.

Question Title IPython Notebook
Q1 k-Nearest Neighbor classifier knn.ipynb
Q2 Training a Support Vector Machine svm.ipynb
Q3 Implement a Softmax classifier softmax.ipynb
Q4 Two-Layer Neural Network two_layer_net.ipynb
Q5 Higher Level Representations: Image Features features.ipynb

Assignment 2

Modified Python files: layers.py, optim.py, fc_net.py, cnn.py.

Question Title IPython Notebook
Q1 Fully-connected Neural Network FullyConnectedNets.ipynb
Q2 Batch Normalization BatchNormalization.ipynb
Q3 Dropout Dropout.ipynb
Q4 Convolutional Networks ConvolutionalNetworks.ipynb
Q5 PyTorch / TensorFlow on CIFAR-10 PyTorch.ipynb

Assignment 3

Modified Python files: rnn_layers.py, rnn.py, net_visualization_pytorch.py, style_transfer_pytorch.py, gan_pytorch.py.

Question Title IPython Notebook
Q1 Image Captioning with Vanilla RNNs RNN_Captioning.ipynb
Q2 Image Captioning with LSTMs LSTM_Captioning.ipynb
Q3 Network Visualization NetworkVisualization-PyTorch.ipynb
Q4 Style Transfer StyleTransfer-PyTorch.ipynb
Q5 Generative Adversarial Networks Generative_Adversarial_Networks_PyTorch.ipynb

EECS 498-007 / 598-005: Deep Learning for Computer Vision

Assignment 4

Modified Python files: pytorch_autograd_and_nn.py, rnn_lstm_attention_captioning.py, network_visualization.py, style_transfer.py.

Question Title IPython Notebook
Q1 PyTorch Autograd pytorch_autograd_and_nn.ipynb
Q2 Image Captioning with Recurrent Neural Networks rnn_lstm_attention_captioning.ipynb
Q3 Network Visualization network_visualization.ipynb
Q4 Style Transfer style_transfer.ipynb

Assignment 5

Modified Python files: single_stage_detector.py, two_stage_detector.py.

Question Title IPython Notebook
Q1 Single-Stage Detector single_stage_detector_yolo.ipynb
Q2 Two-Stage Detector two_stage_detector_faster_rcnn.ipynb

Assignment 6

Modified Python files: vae.py, gan.py.

Question Title IPython Notebook
Q1 Variational Autoencoder variational_autoencoders.ipynb
Q2 Generative Adversarial Networks generative_adversarial_networks.ipynb

Useful Links

The list below provides the most useful external resources that helped me to clarify and understand deeply some ambiguous topics encountered in the lectures. Note that those are only the most important ones, that is, completely understanding them will maybe require checking other -not mentioned- resources.

Result Examples

As mentioned previously, assignments are the funniest part of the courses. In this section, I will provide some interesting obtained results. That being said, during assignment solving, you will encounter other amazing results, here I just picked some of them.

Class Visualization

Consists of generating a synthetic image that will maximize some class score. Illustrations shown below are generated by applying this technique on a pre-trained CNN on ImageNet dataset. You can see the changes on the synthetic image during training for different classes (categories). Even that those images are not understandable (because they are supposed to maximize scores, not to be pretty) you can identify some specific patterns/shapes for these particular classes.

Analog Clock Dining Table Kit Fox Tarantula
Analog Clock Dining Table Kit Fox Tarantula

GANs

The goal of Generative Adversarial Networks is to generate novel data (in our case, images) that mimic the original data from a dataset. Illustrations below were generated by training three types of GANs (on left: The most basic one, on right: The most advanced one) on the MNIST dataset. You can see the changes on the generated images during training, from completely noisy to reasonable images (that do not exist in the dataset).

Vanilla GAN DCGAN
Vanilla GAN DCGAN

Style Transfer

Consists of applying a style from an artistic drawing on an input image. The illustration below shows the result of applying different styles on two images.

Style transfer
Style transfer applied on two images.
Drawing credits (from left to the right): The Scream, Bicentennial Print, Horses on the Seashore, Head of a Clown and The Starry Night.

Credits

I would like to thank everyone involved, directly or indirectly, for making such a great course freely available for everyone. Especially, the instructors: Fei-Fei Li, Andrej Karpathy, Serena Yeung and Justin Johnson.