liyin2015/cv-ai-applications

All you need to grasp computer vision applications without being overwhelmed!

Apache-2.0

Awesome Computer Vision(AI) Applications :

A comprehensive list of awesome computer vision applications, accompanied with Introduction, landmark research papers, and Demos. Applications are organized in two ways: (1) by generic applications, (2) by the enabling techniques. Besides applications, we track cool AI orgs and startups. Additionally, (1) we orgnaize datasets, models, and metrics in a section for some applications, (2) we showcase some applications a public instagram account.

Contributing

Please feel free to send me pull requests or email (li.yin.gravity@gmail.com) to add links.

CV Applications organized by generic applications

Image Understanding
- Object Detection
- Face Recognition
Nenural Rendering

CV applications organized by tech

Visual Language Models
Visual representation learning To a future of Self-supervised learning

Introduction to Neural Rendering

A gentle introduction between neural rendering and traditional computer graphics (classical rendering)

Pre-AI Neural Rendering

AI knowledge

Concurrently, progress in computer vision and machine learninghave given rise to a new approach to image synthesis and editing, namely deep generative models, mainly GANs. Different GANs are able to synthesized images with controllable properties such as camera view points and illumination conditons. Controllability: (1) latent space (2) inverse graphics

Applications

(1) Basic 2D (image to image rendering)

Coloring
Super-resolution

(2) Advanced 2D with more controllability

Sketch to image.
Text to image.
Segmentation to Image
Image to annimation
Gender exchange
Face swap
Aging
Disfiguration
Style transfer

Demo: NVIDIA GauGAN2, Youtube Tutorial

Landmark papers:

Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. #cite:11845. This paper is a pioneer one for the image to image one.
Liu, M. Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks. In Advances in neural information processing systems (pp. 700-708). . #cite: 1980.
pix2pixHD: Wang, Ting-Chun, et al. "High-resolution image synthesis and semantic manipulation with conditional gans." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.. Segmentation to image.

Leveraging Vision Language models

StyleGAN-NADA: Gal, Rinon, et al. "StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators." (2021)., github. Similar to CycleGAN, unpaired $I \rightarrow I$ translation.

Patashnik, Or, et al. "Styleclip: Text-driven manipulation of stylegan imagery." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.. #cite: 51, StyleCLIP Demo.

Controllability:

StyleGAN: Karras, Tero, Samuli Laine, and Timo Aila. "A style-based generator architecture for generative adversarial networks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.. $z \rightarrow I, + $ SOTA latent space control, #cite: 2993. Resources: blog, Presentation, demos (1) Disentangle semantic attributes better than traditional latent space.

InterFaceGAN: Shen, Yujun, et al. "Interpreting the latent space of gans for semantic face editing." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. What a GAN actually learns with respect to the latent space? How the latent code can be used for image editing? Solution: train SVM on each attribute using binary classification, editing latent space by manipulating around decision boundary.

(3) 2D to 3D

Tech: GANs with 3D control, papers: [photoApp][Controllability]

(4) Advanced

sketch to video with movements

Demo

NVIDAI AI Playground
NVIDIA Canvas
Nvidia GauGan2
StyleGAN-NADA
StyleCLIP
replicate.com: a model hosting website that you can host your demos too!

Datasets, Models, Metrics

Reference