Awesome Computer Vision(AI) Applications :
A comprehensive list of awesome computer vision applications, accompanied with Introduction, landmark research papers, and Demos. Applications are organized in two ways: (1) by generic applications, (2) by the enabling techniques. Besides applications, we track cool AI orgs and startups. Additionally, (1) we orgnaize datasets, models, and metrics in a section for some applications, (2) we showcase some applications a public instagram account.
Contributing
Please feel free to send me pull requests or email (li.yin.gravity@gmail.com) to add links.
CV Applications organized by generic applications
- Image Understanding
- Object Detection
- Face Recognition
- Nenural Rendering
CV applications organized by tech
- Visual Language Models
- Visual representation learning To a future of Self-supervised learning
Introduction to Neural Rendering
Pre-AI Neural Rendering
AI knowledge
Concurrently, progress in computer vision and machine learninghave given rise to a new approach to image synthesis and editing, namely deep generative models, mainly GANs. Different GANs are able to synthesized images with controllable properties such as camera view points and illumination conditons. Controllability: (1) latent space (2) inverse graphics
- State of the Art on Neural Rending
- Tutorial
- Introduction to GANs
- Neural Rendering and Its Applications in Computer Graphics (Presented by Lambda)
Applications
(1) Basic 2D (image to image rendering)
- Coloring
- Super-resolution
(2) Advanced 2D with more controllability
- Sketch to image.
- Text to image.
- Segmentation to Image
- Image to annimation
- Gender exchange
- Face swap
- Aging
- Disfiguration
- Style transfer
Demo: NVIDIA GauGAN2, Youtube Tutorial
Landmark papers:
- Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. #cite:11845. This paper is a pioneer one for the image to image one.
- Liu, M. Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks. In Advances in neural information processing systems (pp. 700-708). . #cite: 1980.
- pix2pixHD: Wang, Ting-Chun, et al. "High-resolution image synthesis and semantic manipulation with conditional gans." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.. Segmentation to image.
- SPADE: Park, Taesung, et al. "Semantic image synthesis with spatially-adaptive normalization." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.. Segmentation+ Style to image. code, GauGAN v1 demo
- StackGAN:Zhang, Han, et al. "Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks." Proceedings of the IEEE international conference on computer vision. 2017.. #cite: 2045. Text -> image.
- Ramesh, Aditya, et al. "Zero-shot text-to-image generation." arXiv preprint arXiv:2102.12092 (2021).
Leveraging Vision Language models
- StyleGAN-NADA: Gal, Rinon, et al. "StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators." (2021)., github. Similar to CycleGAN, unpaired
$I \rightarrow I$ translation.
- Patashnik, Or, et al. "Styleclip: Text-driven manipulation of stylegan imagery." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.. #cite: 51, StyleCLIP Demo.
Controllability:
- StyleGAN: Karras, Tero, Samuli Laine, and Timo Aila. "A style-based generator architecture for generative adversarial networks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.. $z \rightarrow I, + $ SOTA latent space control, #cite: 2993. Resources: blog, Presentation, demos (1) Disentangle semantic attributes better than traditional latent space.
- InterFaceGAN: Shen, Yujun, et al. "Interpreting the latent space of gans for semantic face editing." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. What a GAN actually learns with respect to the latent space? How the latent code can be used for image editing? Solution: train SVM on each attribute using binary classification, editing latent space by manipulating around decision boundary.
(3) 2D to 3D
- Tech: GANs with 3D control, papers: [photoApp][Controllability]
(4) Advanced
- sketch to video with movements
Demo
- NVIDAI AI Playground
- NVIDIA Canvas
- Nvidia GauGan2
- StyleGAN-NADA
- StyleCLIP
- replicate.com: a model hosting website that you can host your demos too!