Implementation of "CPMF: A Collective Pairwise Matrix Factorization Model for Upcoming Event Recommendation".
If you find this method helpful for your research, please cite this paper:
@inproceedings{LiuZWXHG17,
author = {Chun{-}Yi Liu and
Chuan Zhou and
Jia Wu and
Hongtao Xie and
Yue Hu and
Li Guo},
title = {{CPMF:} {A} collective pairwise matrix factorization model for upcoming event recommendation},
booktitle = {2017 International Joint Conference on Neural Networks, {IJCNN} 2017},
pages = {1532--1539},
year = {2017}
}
- python >= 3.4
- numpy
- tqdm
- sklearn
The dataset can be downloaded from large network. Or connect the author of the paper "Event-based Social Networks: Linking the Online and Offline Social Worlds".
We extract the dataset of a city from the original dataset as follows:
python main.py --mode=prepro --pre_config=xxxx
The pre_config
need to be specified. It is a .json
file.
keys | Description |
---|---|
save_base |
str . The path to save the preprocessed data. The real path will be the save_base/city_name . The city_name is pre-defined in the preprocess.py . |
radius |
int . The radius (kilometer) of a city. The city is treated as a cycle and the user/event within the cycle is treated as part of the city dataset. The centers of cities are pre-defined in the preprocess.py . |
wash |
int . The parameter is to filter the users who attended less than wash events. |
user_file |
The user location file. (user_lon_lat.csv ) |
event_file |
The event location file. (event_lon_lat.csv ) |
ue_file |
The file of user-event pairs. (user_event.csv ) |
ug_file |
The file of user-group pairs. (user_group.csv ) |
ef_file |
The file of group-event pairs. (event_group.csv ) |
The process will generate several file in the saved path, including user.loc
, event.loc
, ue.pair
, ug.pair
, eg.pair
and info.json
.
Then, we can train our model with:
python main.py --mode=run --city_info_path=xxxxx --train_path=yyy --dev_path=zzz
The city_info_path
is the path of the generated info.json
. The input total datasets will first divide into a training dataset and a evaluation dataset. The saved paths can be specified by train_path
and dev_path
respectively. The saved files are both .json
files. The ratio of the evaluation dataset is specified by test_ratio
.
If there exists both the train_path
and dev_path
, the pre-divided training and evaluation datasets will be loaded to avoid re-split.
Then, the model will be trained, and we can observe the performance of the training and evaluation dataset. And the model is auto saved in the training process. Note only one model will be saved, and the user should monitor the performance and manually conduct the early stopping.
There are also arguments to set the hyper-parameter of the model.and the introduction can be find in the main.py
.
Given the evaluation dataset, we can conduct the evaluation with:
python main.py --mode=eval --dev_path=xxxxx
The dev_path
is the path of the evaluation file.
The original codes were lost. I have attempted to re-create them as faithfully as possible but there may be some issues. If you find any bugs, please report them to me.