This repository is the Pytorch implementation of our paper "Maria: A Visual Experience Powered Conversational Agent" in ACL 2021.
In this paper, we present Maria, a neural conversation agent powered by the visual world experiences which are retrieved from a large-scale image index. Maria consists of three flexible components, i.e., text-to-image retriever, visual concept detector and visual-knowledge-grounded response generator.
Coming soon!
-
python 3.7
-
pytorch 1.4.0
-
Ubuntu 18.04
Please download the Reddit data from google drive here.
We will use the Open Images images as candidate images for retrievel. Refer to here to download the images first. You can build the image index with the appropriate size (500,000 in our experiments) as needed.
If you already have Open Images dataset on disk. Save them as
data
|-- open_images
|-- images
|-- 14928b4f367c217e.jpg
|-- 289d643a8761aa83.jpg
|-- ......
Please refer to retrieval_model/README.md
Please refer to detector_model/README.md
Please refer to dialog_model/README.md
If you find this paper helps your research, please kindly consider citing our paper in your publications.
@inproceedings{liang2021maria,
title={Maria: A Visual Experience Powered Conversational Agent},
author={Liang, Zujie and
Hu, Huang and
Xu, Can and
Tao, Chongyang and
Geng, Xiubo and
Chen, Yining and
Liang, Fan and
Jiang, Daxin},
booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL)},
year={2021}
}
Special thanks to the authors of OSCAR, vokenization, and py-bottom-up-attention.