Towards KnOwledge-Based pErsonalized Product Description Generation in E-commerce.
Qibin Chen*, Junyang Lin*, Yichang Zhang, Hongxia Yang, Jingren Zhou, Jie Tang.
*Equal contribution.
In KDD 2019 (Applied Data Science Track)
- Linux or macOS
- Python 3.6
- PyTorch 1.0.1
- NVIDIA GPU + CUDA cuDNN
Clone this repo.
git clone https://github.com/THUDM/KOBE
cd KOBE
Please install dependencies by
pip install -r requirements.txt
- We use the TaoDescribe dataset, which contains 2,129,187 product titles and descriptions in Chinese.
- (optional) You can download the un-preprocessed dataset from here or here (for users in China).
- First, download the preprocessed TaoDescribe dataset by running
python scripts/download_preprocessed_tao.py
.- If you're in regions where Dropbox are blocked (e.g. Mainland China), try
python scripts/download_preprocessed_tao.py --cn
.
- If you're in regions where Dropbox are blocked (e.g. Mainland China), try
- (optional) You can peek into the
data/aspect-user/preprocessed/test.src.str
anddata/aspect-user/preprocessed/test.tgt.str
, which include product titles and descriptions in the test set, respectively. In src files,<x> <y>
means this product is intended to show with aspect<x>
and user category<y>
. Note: this slightly differs from the<A-1>
,<U-1>
format descripted in the paper but basically they are the same thing. You can also peek intodata/aspect-user/preprocessed/test.supporting_facts_str
to see the knowledge we extracted from dbpedia for the corresponding product.
-
Different configurations for models in the paper are stored under the
configs/
directory. Launch a specific experiment with--config
to specify the path to your desired model config and--expname
to specify the name/number of this experiment which will be used in logging. -
We include three config files here: the baseline, KOBE without adding external knowledge, and full KOBE model.
-
Baseline
python core/train.py --config configs/baseline.yaml --expname baseline
- KOBE without adding knowledge
python core/train.py --config configs/aspect_user.yaml --expname aspect-user
- KOBE
python core/train.py --config configs/aspect_user_knowledge.yaml --expname aspect-user-knowledge
The default batch size
is set to 64.
If you are having OOM problems, try to decrease it with the flag --batch-size
.
- You can use TensorBoard. It can take (roughly) 12 hours for the training to stop. To get comparable results in paper, you need to train for even longer (by editing
epoch
in the config files). However, the current setting is enough to demonstrate the effectiveness of our model.
tensorboard --logdir experiments --port 6006
- During training, the generated descriptions on the test set is saved at
experiments/<expname>/candidate.txt
and the ground truth is atreference.txt
. This is generated by greedy search to save time in training and doesn't block repetitive terms. - To do beam search with
beam width = 10
, run the following command.
python core/train.py --config configs/baseline.yaml --mode eval --restore experiments/finals-baseline/checkpoint.pt --expname eval-baseline --beam-size 10
- BLEU
- DIVERSITY
TODO
If you have ANY difficulties to get things working in the above steps, feel free to open an issue. You can expect a reply within 24 hours.
Please cite our paper if you use this code in your own work:
@article{chen2019towards,
title={Towards Knowledge-Based Personalized Product Description Generation in E-commerce},
author={Chen, Qibin and Lin, Junyang and Zhang, Yichang and Yang, Hongxia and Zhou, Jingren and Tang, Jie},
journal={arXiv preprint arXiv:1903.12457},
year={2019}
}