Towards Knowledge-Based Recommender Dialog System.
Qibin Chen, Junyang Lin, Yichang Zhang, Ming Ding, Yukuo Cen, Hongxia Yang, Jie Tang.
In EMNLP-IJCNLP 2019
- New: The current SOTA on the ReDial dataset is https://arxiv.org/pdf/2007.04032.pdf. Don't miss out that great work!
- New: code and README are improved.
- We curated a paper list for NLP + Recommender System at https://github.com/THUDM/NLP4Rec-Papers. Contributions are welcome.
- Linux
- Python 3.6
- PyTorch >= 1.2.0
Clone this repo.
git clone https://github.com/THUDM/KBRD
cd KBRD
Please install dependencies by
pip install -r requirements.txt
Installation of the torch-geometric module needs to be carefully handled as indicated in https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html#installation-via-binaries.
As of September 3rd, for compute canada, the torch version is 1.9.0, CUDA version is 10.0, checked through:
python -c "import torch; print(torch.__version__)"
and
python -c "import torch; print(torch.version.cuda)"
For my compute canada set up, needed to install the following modules:
pip install --no-index torch-sparse -f https://pytorch-geometric.com/whl/torch-${TORCH}+${CUDA}.html
pip install --no-index torch-geometric
where ${CUDA} and ${TORCH} should be replaced by the specific CUDA version (cpu, cu92, cu101, cu102, cu110, cu111) and PyTorch version (1.4.0, 1.5.0, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0), respectively. For example, for PyTorch 1.9.0 and CUDA 11.1, type:
pip install --no-index torch-sparse -f https://pytorch-geometric.com/whl/torch-1.9.0+cu102.html
pip install --no-index torch-geometric```
Suggestion taken from StackOverFlow post:https://stackoverflow.com/questions/65860764/pytorch-torch-sparse-installation-without-cuda
- We use the ReDial dataset, which will be automatically downloaded by the script.
- Download the refined knowledge base (dbpedia) used in this paper [Google Drive] [Tsinghua Cloud]. Decompress it and get the
<path/to/KBRD/dbpedia/>
folder, which should contain two filesmappingbased_objects_en.ttl
andshort_abstracts_en.ttl
. - Download the proprocessed extracted entities set [Google Drive] [Tsinghua Cloud] and put it under
<path/to/KBRD/data/redial/
.
- To train the recommender part, run:
bash scripts/both.sh <num_exps> <gpu_id>
(optional) bash scripts/baseline.sh <num_exps> <gpu_id>
- To train the dialog part, run:
bash scripts/t2t_rec_rgcn.sh <num_exps> <gpu_id>
The test results are displayed at the end of training and can also be found at saved/<model_name>.test
.
Training outputs, TensorBoard logs and models files are be saved in saved/
folder.
scripts/score.py
is used to hypothesis testing the significance of improvement between different models. To use, first run multiple experiments withnum_exps > 1
, for example:
bash scripts/both.sh 2 <gpu_id>
bash scripts/baseline.sh 2 <gpu_id>
Then,
python scripts/score.py --name-1 saved/release_baseline --name-2 saved/both_rgcn --num 2 --metric recall@50
where you should remove the trailing _0
, _1
automatically added to the model names, nums
should be set the same as num_exps
above, and recall@50
can be replaced with other evaluation metrics in the paper.
Sample output:
[0.298, 0.2918]
0.2949
0.0031
[0.3417, 0.3369]
0.3393
0.0024
Ttest_indResult(statistic=-11.325204070341204, pvalue=0.007706635327863829)
scripts/display_model.py
is used to generate responses.
python scripts/display_model.py -t redial -mf saved/transformer_rec_both_rgcn_0 -dt test
Example output ([TorchAgent] is our model output):
~~
[eval_labels_choice]: Oh, you like scary movies?
I recently watched __unk__
[movies]:
37993
[redial]:
Hello!
Hello!
What kind of movies do you like?
I am looking for a movie recommendation. When I was younger I really enjoyed the __unk__
[label_candidates: 3|37993|50395||Oh, you like scary movies?
I recently watched __unk__]
[eval_labels: Oh, you like scary movies?
I recently watched __unk__]
[TorchAgent]: have you seen "The Shining (1980)" ?
~~
scripts/show_bias.py
is used to show the vocabulary bias of a specific movie (like the qualitative analysis in Table 4)
python scripts/show_bias.py -mf saved/transformer_rec_both_rgcn_0
-
Understanding model outputs. Please see THUDM#15 (comment).
-
Adapting this code to other datasets. It is not straightforward for this code to be run on other datasets currently. The main reason is that we cached the entity linking process in KBRD for ReDial. Please see THUDM#10 (comment) for details.
-
Why the recommender and the dialog part are trained separatedly? Please refer to THUDM#9 (comment) for detailed explanation.
Since the dialog and recommender was tested separately, so far the response generated from the repository can be split into 2 parts:
Generating conversations:
python scripts/display_model.py -t redial -mf saved/transformer_rec_both_rgcn_0 -dt test
The row for [TorchAgent] is for model generation.
Generating recommendations:
Please add return Output(list(map(lambda x: str(self.movie_ids[x]), outputs.argmax(dim=1).tolist()))) to the eval_step after https://github.com/THUDM/KBRD/blob/master/parlai/agents/kbrd/kbrd.py before execution.
python scripts/display_model.py -t redial -mf saved/both_rgcn_0 -dt test
[KbrdAgent] is the item entity id for the recommender system,the corresponding movie names can be found in data/redial/entity2entityID.pkl.
If you have additional questions, please let us know.
Please cite our paper if you use this code in your own work:
@article{chen2019towards,
title={Towards Knowledge-Based Recommender Dialog System},
author={Chen, Qibin and Lin, Junyang and Zhang, Yichang and Ding, Ming and Cen, Yukuo and Yang, Hongxia and Tang, Jie},
journal={arXiv preprint arXiv:1908.05391},
year={2019}
}