dataset | # Users | # POI |
---|---|---|
Yelp2020 | 28038 | # 15745 |
Yelp | 30887 | # 18995 |
Gowalla | 18737 | # 32510 |
Foursquare | 7642 | # 28483 |
- If you have already got the processed data which are train_6.pkl and test_6.pkl, skip to 2. (Training).
- otherwise
- In the folder of albert, there are five steps to preprocess the original Yelp open review data, which we used is version-2020.
- step 1 training the albert-model.
- step 2 filtering the POIs which have less than 10 visits and users who have visited less than 10 POIs, and obtain the numeric ids of POIs and users. In this step you can obtain the number of POIs and users, then you may need modify TYPE_NUMBER and USER_NUMBER in transformer.Constants.
- step 3 to generate the dictionary which you can obtain the top category through the numeric ids of the POIs.
- step 4 to obtain the text visit sequence -- Yelp_reviews_test.txt.
- step 5 to compute the distance matrix of all POIs.
- step 6 generate the input data, train_6.pkl and test_6.pkl
sh run.sh
- Right now the code only supports single GPU training, but an extension to support multiple GPUs should be easy.
- If there is any problem, please contact to kaysen@hdu.edu.cn or fukumoto@yamanashi.ac.jp.
If this repository helps you, please cite:
@article{wang2023statrl,
title={STaTRL: Spatial-temporal and text representation learning for POI recommendation},
author={Wang, Xinfeng and Fukumoto, Fumiyo and Li, Jiyi and Yu, Dongjin and Sun, Xiaoxiao},
journal={Applied Intelligence},
volume={53},
number={7},
pages={8286--8301},
year={2023},
publisher={Springer}
}