Computer Vision for Beginners - COMP4423 @ PolyU HK

The Lectures and Tutorials

L1 Introduction to Computer Vision

What is Computer Vision?
Applications (object detection, semantic segmentation, style transfer, etc.)
A brief history of Computer Vision
Play with FPV Recognition

Lecture Slides: L1-Introduction.pdf

Video Link: https://youtu.be/sWwWroRpqkM?si=V3FSwlet643YTDSU

Tutorial Environment Setup: T1-Get Environment Ready

L2 Image Processing I: Let's play with the images

How Human/Computers see images
Display the images
Play with the images (colors, sizes, rotations)
Examples from IMHere

Lecture Slides: L2-Image.Processing.I.pdf

Video Link: https://youtu.be/scrAoh-L7KU?si=w2AmQ0Pl4AAgBoJd

Tutorial Tasks (Google CoLab): T2-Play.with.images-tasks.ipynb

Tutorial Answers (Google CoLab): T2-Play.with.images-answers.ipynb

Image Lenna: T2-lenna.png

L3 Image Processing II: Let's play with the content

Filters and convolutions
Edge Filters
Nose Reduction
Morphological Operations

Slides: L3-Image.Processing.II.pdf

Video Link: https://youtu.be/UVGG4ZFQWrw?si=DkQj4y8ppGYacYxO

Tutorial Tasks (Google CoLab): T3-Play.with.content-tasks.ipynb

Tutorial Answers (Google CoLab): T3-Play.with.content-answers.ipynb

Challenge Tasks (Google CoLab): T3-Play.with.content-challenge.ipynb

Virus Image: T1-coronvirus-mask.png

Image Lenna: T2-lenna.png

L4 Featrue Extraction

Feature vectors
Feature Space
Quantization
Metrics (Distance and Similarity)
Global and Local Features (Color Histograms, LBP, SIFT)

Lecture Slides: L4-Feature.Extraction.pdf

Video Link: https://youtu.be/7UUWyQiCtfU?si=mbCBjrJLwoi6kXhO

Demo: Keypoint extraction and tracking

Demo 2: Keypoint extraction and tracking

Tutorial Tasks (Google CoLab): T4-Feature_extraction_task

Tutorial Answers (Google CoLab): T4-Feature_extraction_answers

L5 Image Retrieval Fundamentals

Clustering
K-Means
Content-based image retrieval (CBIR)
Bag of Visual Words (BoVW)

Lecture Slides: L5-Image.Retrieval.pdf

Video Link：https://youtu.be/VtCf9HCqAEw?si=a-7A9YHesKOWu49g

Tutorial Tasks (Google CoLab): T5-Image.retrieval-tasks.ipynb

Sample Code for tone modifier challange:

For vocabulary learning: T5-Challenge-train
Tone modification and display: T5-Challenge-display

L6 Image Classification Fundamentals

Classification
Supervised learning
K nearest neighbors (k-NN)
Bayesian classifiers
Support vector machines (SVM)

Lecture Slides: L6-Image.Classification.pdf

Video Link: https://youtu.be/bUwGY5sqZHU?si=GSxOPDWWQaSr0dw9

Paper Rock Scissors Game Demo: https://youtu.be/dGwou6Khvqo?si=zoMzRBObLU9FUXZr

Tutorial Tasks: T6-Image-Classification

Challenges: T6-Challenges

L7 Traditional Machine Learning to Deep Learning

Traditional machine learning vs. deep learning
Gradient decent
Neural networks
Deep neural networks
Convolutional neural networks (CNN)
Layers, pooling, and activations
AlexNet, VGG, and ResNet

Lecture Slides: L7-Machine.learning.Deep.learning.pdf

Video Link: https://youtu.be/xc5MKb8LNBo?si=MlCAFszzgy001A3e

Tutorial Tasks (Google CoLab): T7-Machine.learning.Deep.learning-tasks.ipynb

Tutorial Data: T7-data.zip

L8 Deep Image Retrieval

Deep image retrieval
Feature aggregation/embedding/fusion
Fine tuning (Siamese/Triplet networks)
R-Mac, VLAD, BoVW

Lecture Slides: L8-Deep.image.retrieval.pdf

Video Link: https://youtu.be/klu6SHHoC2E?si=5vCc6-mbt-VzCOlN

Tutorial Answers (Google CoLab): T8-Deep.image.retrieval-answers.ipynb

Tutorial Data: T8-data.zip

Pytorch - Quick Start: T8-Pytorch-Quick-Start.ipynb

L9 CAM, Attentions and Transformers

Class Activation Mapping (CAM)
Attentions
Self-Attentions, and Transformers

Lecture Slides: L9-CAM.Attention.Transformer.pdf

Video Link: https://youtu.be/Ypi4F7nt2u4?si=9FDTkpZw3UIjwdvz

Tutorial Answers: T9-CAM and ViT

L10 Detection & Segmentation

Object detection and Image Segmentation
Yolo
UNet,
R-CNN, Fast-RCNN, Faster-RCNN, Mask-RCNN

Lecture Slides: L10-Detection.Segmentation.pdf

Video Link: https://youtu.be/gdDDQtcttZA?si=LgCJqo5hs1vuT7Bg

Tutorial Answers (Google CoLab): T10-Detection.Segmentation-answers.ipynb

Tutorial Data: T10-Images

L11 Learning Paradigms

Multi-task learning
N-shot learning (Few-shot, Zero-shot)
Transfer learning, Metric learning, Meta-learning
Generative networks (VAE, GAN)
Reinforcement learning

Lecture Slide: L11-Learning.Paradigms.pdf

Video Link: https://youtu.be/_jyfvaiB4g4

Tutorial RNN: T11-RNN.ipynb

Tutorial Slides: T11-RNN-and-Network-Debug

L12 Large Models

RNN and Image Captioning
Transformers
Large Language Models

Lecture Slide: L12-Large.Models.pdf

Appendix: Image-Synthesis

lookwei/COMP4423