/RQ-VAE-Recommender

[Pytorch] Generative retrieval model using semantic IDs from "Recommender Systems with Generative Retrieval"

Primary LanguagePython

RQ-VAE Recommender

This is a PyTorch implementation of a generative retrieval model using semantic IDs based on RQ-VAE from "Recommender Systems with Generative Retrieval". The model has two stages:

  1. Items in the corpus are mapped to a tuple of semantic IDs by training an RQ-VAE (figure below).
  2. Sequences of semantic IDs are tokenized by using a frozen RQ-VAE and a transformer-based is trained on sequences of semantic IDs to generate the next ids in the sequence. image.

Currently supports

  • Datasets: MovieLens 1M
  • RQ-VAE Pytorch model implementation + KMeans initialization + RQ-VAE Training script.
  • Decoder-only retrieval model + Training code with semantic id user sequences from randomly initialized or pretrained RQ-VAE.

Executing

RQ_VAE tokenizer model and the retrieval model are trained separately, using two separate training scripts.

  • RQ-VAE tokenizer model training: Trains the RQ-VAE tokenizer on the item corpus. Executed via python train_rqvae.py
  • Retrieval model training: Trains retrieval model using a frozen RQ-VAE: python train_decoder.py

Next steps

  • ML1M timestamp-based train/test split.
  • Comparison encoder-decoder model vs. decoder-only model.
  • Eval loops.

References