A list of papers and other resources on language-guided image editing.
None paired images
- Oxford-102: The flowers consisting of 102 flower categories are chosen to be flower commonly occuring in the United Kingdom. Each class consists of between 40 and 258 images.
Paired images
- CUB Bird: an image dataset with 6,033 photos of 200 bird species (mostly North American).
- COCO: natural scenes.
- Multi-Modal-CelebA-HQ: a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Each image has high-quality segmentation mask, sketch, descriptive text, and image with transparent background.
- GIER
- CoDraw
- i-CLEVER
- Semantic Image Synthesis via Adversarial Learning. paper code
Hao Dong, Simiao Yu, Chao Wu, and Yike Guo. (Imperial College London) - Language-Based Image Editing with Recurrent Attentive Models. CVPR2018. paper code
Jianbo Chen, Yelong Shen, Jianfeng Gao, Jingjing Liu, and Xiaodong Liu. (UCB, MSR) - Learning to Globally Edit Images with Textual Description. ArXiv2018. paper code
Hai Wang, Jason D. Williams, and SingBing Kang. (TTIC, Apple, MSR) - Text-Adaptive Generative Adversarial Networks: Manipulating Images with Natural Language. NeurlPS2018. paper code
Seonghyeon Nam, Yunji Kim, and Seon Joo Kim. (Yonsei University) - Bilinear Representation for Language-Based Image Editing Using Conditional Generative Adversarial Networks. ICASSP2019 paper code
Xiaofeng Mao, Yuefeng Chen, Yuhong Li, Tao Xiong, Yuan He, and Hui Xue. (Alibaba) - ManiGAN: Text-Guided Image Manipulation. CVPR2020. paper code
Bowen Li, Xiaojuan Qi, Thomas Lukasiewicz, and Philip H. S. Torr. (Oxford, UHK) - A Benchmark and Baseline for Language-Driven Image Editing. ACCV2020. paper code
Jing Shi, Ning Xu, Trung Bui, Franck Dernoncourt, Zheng Wen, and Chenliang Xu. (University of Rochester, Adobe) - Lightweight Generative Adversarial Networks for Text-Guided Image Manipulation. NeurIPS2020. paper code Bowen Li, Xiaojuan Qi, Philip H. S. Torr, Thomas Lukasiewicz. (Oxford, UHK)
- TediGAN: Text-Guided Diverse Image Generation and Manipulation. ArXiv2020. paper] code] Multi-Modal-CelebA-HQ Weihao Xia, Yujiu Yang, Jing-Hao Xue, Baoyuan Wu. (Tsinghua, UCL, CUHKSZ)
-
A Multimodal Dialogue System for Conversational Image Editing. NeurIPSW2018. paper
Tzu-Hsiang Lin, Trung Bui, Doo Soon Kim, and Jean Oh. (CMU, Adobe) -
Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction. ICCV2019. paper code
Alaaeldin El-Nouby, Shikhar Sharma, Hannes Schulz, R Devon Hjelm, Layla El Asri, Samira Ebrahimi Kahou, Yoshua Bengio, and Graham Taylor. (University of Guelph, MSR, Vector Institute, University of Montreal) -
Sequential Attention GAN for Interactive Image Editing. MM2020. paper
Yu Cheng, Zhe Gan, Yitong Li, Jingjing Liu, and Jianfeng Gao. (MSR, Duke University) -
SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning. EMNLP2020. paper code
Tsu-Jui Fu, Xin Eric Wang, Scott Grafton, Miguel Eckstein, and William Yang Wang. (UCSB, UCSC) -
Text as Neural Operator: Image Manipulation by Text Instruction. CVPRW2020. paper
Tianhao Zhang, Hung-Yu Tseng, Lu Jiang, Honglak Lee, Irfan Essa, and Weilong Yang. (Google)
Jing Shi
Any recommendations to add to the list are welcome :)
Feel free to make pull requests!