A comprehensive list of awesome computer vision applications, accompanied with Introduction, landmark research papers, and Demos. Applications are organized in two ways: (1) by generic applications, (2) by the enabling techniques. Besides applications, we track cool AI orgs and startups. Additionally, (1) we orgnaize datasets, models, and metrics in a section for some applications, (2) we showcase some applications a public instagram account.
Please feel free to send me pull requests or email (li.yin.gravity@gmail.com) to add links.
- Image Understanding
- Object Detection
- Face Recognition
- Nenural Rendering
- Visual Language Models
- Visual representation learning To a future of Self-supervised learning
Concurrently, progress in computer vision and machine learninghave given rise to a new approach to image synthesis and editing, namely deep generative models, mainly GANs. Different GANs are able to synthesized images with controllable properties such as camera view points and illumination conditons. Controllability: (1) latent space (2) inverse graphics
- State of the Art on Neural Rending
- Tutorial
- Introduction to GANs
- Neural Rendering and Its Applications in Computer Graphics (Presented by Lambda)
- Coloring
- Super-resolution
- Sketch to image.
- Text to image.
- Segmentation to Image
- Image to annimation
- Gender exchange
- Face swap
- Aging
- Disfiguration
- Style transfer
Demo: NVIDIA GauGAN2, Youtube Tutorial
Landmark papers:
- Isola, Phillip, et al. "Image-to-image translation with conditional adversarial networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. #cite:11845. This paper is a pioneer one for the image to image one.
- Liu, M. Y., Breuel, T., & Kautz, J. (2017). Unsupervised image-to-image translation networks. In Advances in neural information processing systems (pp. 700-708). . #cite: 1980.
- pix2pixHD: Wang, Ting-Chun, et al. "High-resolution image synthesis and semantic manipulation with conditional gans." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.. Segmentation to image.
- SPADE: Park, Taesung, et al. "Semantic image synthesis with spatially-adaptive normalization." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.. Segmentation+ Style to image. code, GauGAN v1 demo
- StackGAN:Zhang, Han, et al. "Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks." Proceedings of the IEEE international conference on computer vision. 2017.. #cite: 2045. Text -> image.
- Ramesh, Aditya, et al. "Zero-shot text-to-image generation." arXiv preprint arXiv:2102.12092 (2021).
Leveraging Vision Language models
- StyleGAN-NADA: Gal, Rinon, et al. "StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators." (2021)., github. Similar to CycleGAN, unpaired
$I \rightarrow I$ translation.
- Patashnik, Or, et al. "Styleclip: Text-driven manipulation of stylegan imagery." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.. #cite: 51, StyleCLIP Demo.
Controllability:
- StyleGAN: Karras, Tero, Samuli Laine, and Timo Aila. "A style-based generator architecture for generative adversarial networks." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.. $z \rightarrow I, + $ SOTA latent space control, #cite: 2993. Resources: blog, Presentation, demos (1) Disentangle semantic attributes better than traditional latent space.
- InterFaceGAN: Shen, Yujun, et al. "Interpreting the latent space of gans for semantic face editing." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. What a GAN actually learns with respect to the latent space? How the latent code can be used for image editing? Solution: train SVM on each attribute using binary classification, editing latent space by manipulating around decision boundary.
- Tech: GANs with 3D control, papers: [photoApp][Controllability]
- sketch to video with movements
- NVIDAI AI Playground
- NVIDIA Canvas
- Nvidia GauGan2
- StyleGAN-NADA
- StyleCLIP
- replicate.com: a model hosting website that you can host your demos too!