/foundation-model-paper

Repository for results scripts from paper "Towards a Foundation Purchasing Model: Pretrained Generative Autoregression on Transaction Sequences".

Primary LanguagePythonOtherNOASSERTION

Towards a Foundation Purchasing Model: Pretrained Generative Autoregression on Transaction Sequences

This is the official code implementation of the following manuscript:

Skalski P., Sutton D., Burrell S., Perez I., Wong J. "Towards a Foundation Purchasing Model: Pretrained Generative Autoregression on Transaction Sequences".

It contains code to reproduce evaluations on public datasets and is distributed under a Creative Commons Attribution-NonCommercial 4.0 International license.

Running the code

Installation

To run the code in this repository, install the benchmarker library inside a new virtual environment by running

$ pip install benchmark_public_datasets/benchmarker

You will also need to install and lunch Slurm for job scheduling and a ClickHouse server that will be used for storing datasets.

Data preparation

Before running the code, prepare the datasets by following instructions in public_datasets/README.md

Running evaluations

To benchmark hand-engineered features and embeddings extracted using different algorithms (Table 2 in the paper) run benchmark_public_datasets/1_benchmark_algorithms.sh.

To perform ablation study comparing performance of NPPR method with next event prediction and past reconstruction tasks used in isolation (Table 3 in the paper) run benchmark_public_datasets/2_ablate_tasks_in_np_ne_method.sh.

To compare performance of "most recent" vs "average" embedding modes (Table 4 in the paper) run benchmark_public_datasets/3_avg_vs_most_recent_embeddings.sh