/CIKM2021-IMPChat

CIKM 2021: Learning Implicit User Profile for Personalized Retrieval-based Chatbot

Primary LanguagePythonApache License 2.0Apache-2.0

CIKM2021-IMPChat

CIKM 2021: Learning Implicit User Profile for Personalized Retrieval-based Chatbot (pdf)

In the work, one of the datasets we use is the PchatbotW dataset, please refer to this link for details.

Datasets

In this paper, we evaluate IMPChat on two datasets, Weibo and Reddit:

Weibo

Dataset:

Baidu Disk: Link (4hv3)

Google Storage:Link

Embedding:

Baidu Disk:Link (fob2)

Google Storage:Link

Answer Relevance:

Baidu Disk: Link (v6pv)

Google Storage:Link

Reddit:

Dataset:

Baidu Disk: Link (vg1j)

Google Storage:Link

Embedding:

Baidu Disk:Link (nnie)

Google Storage:Link

Answer Relevance:

Baidu Disk: Link (8mci)

Google Storage:Link

Note that the embeddings are trained on the corresponding dataset. The Answer Relevance file contains the candidate relevances.

Train

Download the datasets and put them on the dataset directory.

Run with:

ts=`date +%Y%m%d%-H%M`
dataset=weibo # or reddit

CUDA_VISIBLE_DEVICES=4,6 python run.py \
    --task ${dataset} \
    --batch_size 128 \
    --eval_steps 5000 \
    --emb_len 200 \
    --max_utterances 29 \
    --learning_rate 5e-4\
    --max_words 50 \
    --n_gpu 2 \
    --epochs 10 \
    --n_layer 3 \
    --max_hop 2 \
    --score_file_path score_file.txt \
    --model_file_name ${dataset}_impchat.pt\
    --is_training True

Reproduce the results

Download the checkpoint files and place them under the checkpoint directory:

Reddit

Baidu Disk: Link (koon)

Google Storage: Link

Weibo

Baidu Disk: Link (v7ck)

Google Storage: Link

Run with:

ts=`date +%Y%m%d%-H%M`
dataset=weibo # or reddit

CUDA_VISIBLE_DEVICES=4,6 python run.py \
    --task ${dataset} \
    --batch_size 128 \
    --eval_steps 5000 \
    --emb_len 200 \
    --max_utterances 29 \
    --learning_rate 5e-4\
    --max_words 50 \
    --n_gpu 2 \
    --epochs 10 \
    --n_layer 3 \
    --max_hop 2 \
    --score_file_path score_file.txt \
    --model_file_name ${dataset}_impchat.pt

Baseline models

You can download all score files of the baseline models we use in the following links:

Baidu Disk Link (vof6)

Google Storage:Link

Each score file is named as {model name}.{task} (e.g. imp.weibo). You can compute the metrics by:

python metrics.py

Cite:

@inproceedings{qian2021impchat,
     author = {Hongjin Qian and Zhicheng Dou and Yutao Zhu Yueyuan Ma and Ji-Rong Wen}, 
     title = {Learning Implicit User Profile for Personalized Retrieval-based Chatbot}, 
     booktitle = {Proceedings of the {CIKM} 2021}, 
     publisher = {{ACM}}, 
     year = {2021},
     url = {https://doi.org/10.1145/3459637.3482269},
     doi = {10.1145/3459637.3482269}
@inproceedings{qian2021pchatbot,
     author = {Hongjin Qian and Xiaohe Li and Hanxun Zhong and Yu Guo and Yueyuan Ma and Yutao Zhu and Zhanliang Liu and Zhicheng Dou and Ji-Rong Wen}, 
     title = {Pchatbot: A Large-Scale Dataset for Personalized Chatbot}, 
     booktitle = {Proceedings of the {SIGIR} 2021}, 
     publisher = {{ACM}}, 
     year = {2021}, 
     url = {https://doi.org/10.1145/3404835.3463239}, 
     doi = {10.1145/3404835.3463239}}