This repository contains the source code of the submitted paper "Faithful Path Language Modeling for Explainable Recommendation over Knowledge Graph".
If this repository IS useful for your research, we would appreciate an acknowledgment by citing our paper:
"Faithful Path Language Modeling for Explainable Recommendation over Knowledge Graph." arXiv preprint arXiv:2310.16452 (2023).
- Python 3.8
Install the required packages:
pip install -r requirements.txt
Download the datasets and the embeddings(to run the plm-rec implementation) from the data.zip and embedding-weights.zip archive at the drive repository: https://drive.google.com/drive/folders/1e0uFWb6iJ6MXHtslZsqV8qRYC0Pl_AR7?usp=sharing Then extract both data.zip and embedding-weights.zip inside the top level of the repository (i.e. the level in which setup.py is located).
The experiments which are reported in the paper can be run with ease by means of the provided bash scripts. This holds both for dataset generation and model training. To access the lower level details, one can directly use the python scripts which are called by the same bash scripts.
Note: all experiments have been run with fixed seed in order to ease reproducibility of the results.
From the top-level (i.e. the folder which contains setup.py and the pathlm folder) Run:
pip install .
To create the preprocessed/mapping
folder needed by the random walk algorithm, run from the top level:
python pathlm/data_mappers/map_dataset.py --data <dataset_name> --model pearlm
To generate all datasets, run from the top level:
source build_datasets.sh
Each dataset is generated by the pipeline described in 'create_dataset.sh' which is in charge of:
- Generation of a dataset of at most X unique paths per user
- Concatenation of the results into a single .txt file
- (Optional) Pruning of the concatenated .txt file (This is only useful if the start entity is chosen instead of the standard 'USER')
- Move of the concatenated and pruned .txt file into the 'data' folder which is used to tokenize and train the models
From the top-level (i.e. the folder which contains setup.py and the pathlm folder).
Install the repository with pip install .
Then, proceed according to the chosen experiment to run as described below. Each bash script can be customised as desired in order to run alternative experiments
To bulk train PERLM, run from the top level:
CUDA_DEVICE_NUM=0
source run_perlm_experiments.sh $CUDA_DEVICE_NUM
To train PLM-Rec, run from the top level:
CUDA_DEVICE_NUM=0
source run_plm-rec_experiments.sh $CUDA_DEVICE_NUM
Before training a specific model, tokenize the dataset, running from the top level:
python pathlm/models/lm/tokenize_dataset.py --data <dataset_name> --sample_size <sample_size>
To train a specific PEARLM, run from the top level:
python pathlm/models/lm/pearlm_main.py --data <dataset_name> --model <base-clm-model> --sample_size <sample_size>
To train a specific PLM, run from the top level:
python pathlm/models/lm/plm_main.py --data <dataset_name> --model <base-clm-model> --sample_size <sample_size>