Maria: A Visual Experience Powered Conversational Agent

This repository is the Pytorch implementation of our paper "Maria: A Visual Experience Powered Conversational Agent" in ACL 2021.

In this paper, we present Maria, a neural conversation agent powered by the visual world experiences which are retrieved from a large-scale image index. Maria consists of three flexible components, i.e., text-to-image retriever, visual concept detector and visual-knowledge-grounded response generator.

Coming soon!

Summary

Maria: A Visual Experience Powered Conversational Agent

Dependencies

python 3.7
pytorch 1.4.0
Ubuntu 18.04

Data

Reddit Conversation Corpus

Please download the Reddit data from google drive here.

Download the Open Images

We will use the Open Images images as candidate images for retrievel. Refer to here to download the images first. You can build the image index with the appropriate size (500,000 in our experiments) as needed.

If you already have Open Images dataset on disk. Save them as

data
|-- open_images
    |-- images
         |-- 14928b4f367c217e.jpg
         |-- 289d643a8761aa83.jpg
         |-- ......

Usage

Citation

If you find this paper helps your research, please kindly consider citing our paper in your publications.

@inproceedings{liang2021maria,
   title={Maria: A Visual Experience Powered Conversational Agent},
   author={Liang, Zujie and 
           Hu, Huang and 
           Xu, Can and 
           Tao, Chongyang and 
           Geng, Xiubo and 
           Chen, Yining and 
           Liang, Fan and 
           Jiang, Daxin},
   booktitle={Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL)},
   year={2021}
}

Acknowledgment

Special thanks to the authors of OSCAR, vokenization, and py-bottom-up-attention.

jokieleung/Maria