Towards a Foundation Purchasing Model: Pretrained Generative Autoregression on Transaction Sequences
This is the official code implementation of the following manuscript:
It contains code to reproduce evaluations on public datasets and is distributed under a Creative Commons Attribution-NonCommercial 4.0 International license.
To run the code in this repository, install the benchmarker library inside a new virtual environment by running
$ pip install benchmark_public_datasets/benchmarker
You will also need to install and lunch Slurm for job scheduling and a ClickHouse server that will be used for storing datasets.
Before running the code, prepare the datasets by following instructions in public_datasets/README.md
To benchmark hand-engineered features and embeddings extracted using different algorithms (Table 2 in the paper) run benchmark_public_datasets/1_benchmark_algorithms.sh
.
To perform ablation study comparing performance of NPPR method with next event prediction and past reconstruction tasks used in isolation (Table 3 in the paper) run benchmark_public_datasets/2_ablate_tasks_in_np_ne_method.sh
.
To compare performance of "most recent" vs "average" embedding modes (Table 4 in the paper) run benchmark_public_datasets/3_avg_vs_most_recent_embeddings.sh