Computer Vision (CS 763) - Spring 2019

Course Information

Instructor: Arjun Jain
Office: 216, CSE New Building
Email: ajain@cse DOT iitb DOT ac DOT in
Teaching Assistants: Rishabh Dabral, Safeer Afaque
Class Room: SIC201
Instructor Office Hours (in room 216 CSE New Building): TBD

Please note that CS663 is a hard prerequisite for this course.

News and Announcements

[8/01/19] Monday class to be moved to 7pm slot to accommodate 3rd year students
[14/01/19] The classroom has been moved to SIC201 (slots 13A and 15A) due to overflow in CC105
[17/01/19] Assignment 1 has been released and is due by 27th Jan.
[30/01/19] Assignment 2 has been released and is due by 8th Feb.
[10/02/19] Assignment 3 has been released and is due by 20th Feb.
[13/03/19] Assignment 4 has been released and is due by 23rd March.
[10/04/19] Assignment 5 has been released and is due by 21st April.
[10/04/19] End-term project evaluation will be held on 6th May.

Topics to be covered (tentative)

Deep Learning in computer vision: the data-driven paradigm, feed forwards networks, back-propagation and chain rule; CNNs and their building blocks, generative adverserial networks (GANs), Variational Autoencoders (VAEs) and Conditional Variational Autoencoders (CVAEs)
Deep Learning applications including face detection, CNN compression, siamese and triplet networks and applications to face recognition
Camera geometry, camera calibration, vanishing points, important transformations, homographies
Image registration: RANSAC for point-matching, SIFT overview
Algorithms for: shape from shading, optical flow, Kanade-Lucas-Tomasi algorithm, applications of optical flow
Photometric stereo - deriving shape from multiple images of an object taken under different lighting conditions; applications to illumination invariant face recognition, face relighting
Stereo (geometric binocular): epipolar geometry and fundamental matrix, the correspondence problem and shape from stereo; structure from motion

Learning materials and textbooks

Lecture slides that will be regularly posted
Computer Vision: Algorithms and Applications, by Richard Szeliski
Fundamentals of Computer Vision, by Mubarak Shah
Deep Learning, by Ian Goodfellow and Yoshua Bengio and Aaron Courville
All iTorch notebooks for topics covered in class can be found here

Grading Policy

Mid-sem exam: 20%
Final exam (cumulative): 20%
Assignments (five or six): 35% (all to be done in groups of 2-3 students)
Course project: 20% (to be done in the same group of 2-3 students)
Class participation: 5%
Course project work will be presented by the student group during a viva at the end of the course. During this viva, each student in the group will be separately questioned, not only on the project work, but also the assignments. Each student is expected to contribute to each and every assignment and the course project.
Audit requirements: You must write both exams, submit all assignments and the project, and score at least 40% to get an AU.

Other Policies

Assignments will be given out (typically) once every two or three weeks. They must be submitted on or before the deadline. No late assignments will be accepted. The programming components of the assignments will typically involve MATLAB and lua, so you must be willing to learn it quickly.
We will adopt a zero-tolerance policy against any forms of plagiarism or any other form of cheating. Just don't do it! In cases of plagiarism, givers and takers will both be considered equally responsible.
This course is (inherently) cumulative. The syllabus for the final exam will include everything taught during the semester.

Course Projects

As mentioned in the grading policy, this course has a project requirement which will be 20% of your grade. The project needs to be done in a group of 2-3 students. We will send out a form which needs to be filled up with your project proposal. For a list of projects, please check this link

Assignments

There will be 5-6 assignments in this course. They will be a mix of theoretical and programming questions.

Assignment 1 on Camera Geometry has been released and is due by 27th Jan.
Assignment 2 on Camera Calibration, Image Alignment and Robust Methods has been released and is due by 8th Feb.
Assignment 3 on Neural Network and Backpropagation has been released and is due by 20th Feb. Please use this Kaggle link to test your predictions and class standing.
Assignment 4 on Recurrent Neural Network has been released and is due by 23rd March. Please use this Kaggle link to test your predictions and class standing.
Assignment 5 on Lucas-Kande Tracker and Video Stabilization has been released and is due by 21st April.

Lecture Schedule:

Date	Topics	Slides	iTorch Notebooks	Extra Reading
7th Jan, 2019	Introduction to computer vision, applications and course overview	Slides	--	--
8th Jan, 2019	Camera Geometry Homogeneous coordinates and projective geometry Vanishing points, ideal line, point line duality in P2 Introduction to the pin-hole camera model	Slides	--	Homogeneous Representations of Points, Lines and Planes
14th Jan, 2019	Important 2D and 3D transformations using homogenous coordinates Modeling the pinhole camera analytically, intinsic and extrinsic parameters World, camera, image plane and sensor plane coordinate systems and transformations between them	Slides	--	--
15th Jan, 2019	Linear and non-linear (lens distortion) errors Homography, planar world and pure rotation of the camera Iterative solutions for dealing with with non-linear (lens distortion) errors Normalized, ideal, euclidian, affine and general camera models Orthographic and weak-perspective camera models	Slides	--	--
21st Jan, 2019	Cross ratios and its applications Camera calibration using DLT (known 3D control points) Introduction to Zhang's camera calibration method	Slides	--	Resource on SVD Additional slides and notes on solving homogenous least squares problem
22nd Jan, 2019	Zhang's camera calibration method, mention of a few DL based calibration methods Image Alignment Image alignment: problem statement, physically and digitally corresponding points Motion models and degrees of freedom; non-rigid/deformable/non-parametric image alignment Control point based image alignment using least squares - derivation for pseudo-inverse Introduction to the SIFT algorithm Forward and reverse image warping - bilinear and nearest-neighbor interpolation Mention of DL based image patch descriptors	Slides	--	--
28th Jan, 2019	Image alignment using image similarity measures: mean squared error, normalized cross-correlation Concept of field of view in image alignment using image similarity measures Monomodal and multimodal image alignment Concept of joint histograms and behaviour of joint histograms in multi-modal image alignment Concept of entropy and joint entropy, algorithm for multimodal registration by minimizing joint entropy Aspects of image registration: 2D/3D, motion model, monomodal or multimodal Application scenarios for image alignment: template matching, video stabilization, panorama generation, face recognition, 3D to 2D alignment	Slides	--	--
29th Jan, 2019	Robust Methods in Computer Vision Least squares problems and their relation to the Gaussian distribution on the noise Examples of outliers in computer vision Explanation of why the Gaussian distribution is unsuited to handling outliers Introduction to the Laplacian distribution The importance of heavy-tailed distributions in robust statistics RANSAC (random sample consensus) algorithm	Slides	--	--
4th Feb, 2019	Deep Learning for Computer Vision History, introduction Data driven paradigm K-NN on CIFAR 10 Hyperparameters, choice of loss function, cross-validation Softmax classifier, cross-entropy loss function, regularization Optimization: vanilla gradient descent, stochastic gradient descent	Slides	KNN	Matrix calculus reminder
5th Feb, 2019	Vanilla momentum, Nesterov momentum, AdaGrad, RMSProp, ADAM Second order optimization methods, it's issues with deep learning Good learning rate, learning rate decay Feed forward, back-propagation Fully connected layer	Slides	Gradient Check, Linear Layer	ADAM, Nesterov DL optimization algorithms overview
11th Feb, 2019	Activation functions: sigmoid, tanh, ReLU, LeakyReLU, ELU, etc. Convolutional layer, dilated convolutions.	Slides	Convolution	Convolution arithmetic for deep learning
12th Feb, 2019	Convolutions: transposed, dilated, fully-connected as convolution, sliding window as convolution Max-pooling, Dropout SoftMax, Cross Entropy	Slides	Transposed convolution, MaxPool, Cross Entropy	--
18th Feb, 2019	Data Augmentation, hyperparamter selection Weight initialization Babysitting the learning process	Slides	Weight Initialization	--
19th Feb, 2019	ConvNet applications ConvNet case studies: AlexNet, ZF-Net, VGGNet, GoogleNet, ResNet, SE-Net Transfer Learning	Slides	--	--
4th March, 2019	Object Detection: RCNN, Fast-RCNN, Faster-RCNN, YOLO, SSD	Slides	--	--
5th March, 2019	Object Detection evaluation metrics: IoU, mAP Object Detection details: RoIAlign, Feature Pyramid Network, Mask-RCNN, Focal Loss	Slides	--	--
11th March, 2019	RNNs, LSTMs	Slides	--	--
12th March, 2019	Visualizing and understanding ConvNets Images that maximize ConvNet class scores, reconstructing images from ConvNet codes Deep Dream, Neural Art, Adversarial Examples Dimentionality reduction: siamese and triplet networks	Slides	--	--
18th March, 2019	Neural Style Transfer Autoencoders Generative modeling: VAEs, GANs Case studies: pix2pix, CycleGAN, UNIT	Slides 1 Slides 2	--	--
26th March, 2019	Orthographic Structure from Motion Factorization Method Rank Therorem	Slides	--	--
1st April, 2019	Optical Flow Dealing with the aperture problem: regularization Horn and Shunck method: algorithm using discrete formulation, steps of Jacobi's method for matrix inversion, and comments about limitations	Slides	--	--
2nd April, 2019	Lucas-Kanade method for Optical Flow Multi-Scale Lucas-Kanade method Comparison of Horn-Shunk and Lucas-Kanade algorithms Applications of Optical Flow	Slides	--	--
8th April, 2019	Kanade-Lucas-Tomasi (KLT) Featurepoint Tracker Tracking feature-points from a template by estimating motion parameters. Finding good features to track.	Slides	--	Lucas-Kanade 20 Years On: A Unifying Framework
9th April, 2019	Geometric Stereo Orientation parameters for the camera pair and relative orientation. Coplanarity constraint for corresponding points Derivation and key properties of the Fundamental matrix 8-Point Algorithm	Slides	--	--
15th April, 2019	Introduction to epipolar geometry Essential matrix	Slides	--	Epipolar Geometry
16th April, 2019	Generating the normalized stereo case from arbitrary views Triangulation Popular parameterizations for the relative orientation	Slides	--	--

cs763/Spring2019