/rec-tool

Tool for simplifying to perform experiments with collaborative filtering models

Primary LanguagePythonMIT LicenseMIT

Tool for simplifying to perform experiments with collaborative filtering models

  1. Install the tool
chmod +x ./build_local.sh
./build_local.sh
  1. Setting up a configuration. An example you can find in config_example.yaml

  2. Running an experiments

rec_tool --config ./config_example.yaml

Config options

Config example This config provides an option to start several experiments with some models (Embedding model, MLP, NCF, Matrix Factorization, SVD, etc.) on several datasets (Movilens/Bookcrossing/Behance/Goodreads). There are few additional options like using different metrics for evaluation together, Grid Search on batch size parameter, number of epochs and learning rate.

  data:
    input_data:
      clear: true
      movielens:
        use: true
        type: ml-1m
        path: /tmp/rec_tool/dataset/movielens
        transformations: rec_tool.transformations.movielens.prepare_data

The additional datasets could be defined as follows

      goodreads:
        use: false
        type: goodreads
        transformations: rec_tool.transformations.goodreads.get_goodreads_data
      bookcrossing:
        use: false
        type: bookcrossing
        transformations: rec_tool.transformations.bookcrossing.bookcrossing_converting
      behance:
        use: false
        type: behance
        transformation: rec_tool.transformations.behance.behance_converting
        

There is option to run several models

  model:
    model: [
      rec_tool.models.embedding.embedding_model,
      rec_tool.models.mlp.mlp,
      rec_tool.models.ncf.ncf_model,
      rec_tool.models.mf.mf,
      rec_tool.models.svd.svd,
    ]

For model evaluation several metrics could be defined as well

    loss: rec_tool.losses.mean_squared_error
    metrics: [
      rec_tool.metrics.accuracy,
      rec_tool.metrics.rmse,
      rec_tool.metrics.mae
    ]

To start GridSearch over batch_size, epochs and learning_rate you need to define the range of these parameters using lists

    batch_size: [1024, 2048, 4096]
    epoch: [50, 100, 200]
    optimizers: adam
    grid_search: True
    learning_rate: 0.01

To save the model you need define a directory for the model. There is also an option to write results into csv file.

  result:
    model: /tmp/rec_tool/model/
    log: /tmp/rec_tool/log/
    results_csv: run_results.csv
    log_to_ml_flow: True
    clear: true

Example with the one model, goodreads dataset, one batch_size, one epoch

config:
  data:
    input_data:
      clear: true
      movielens:
        use: false
        type: ml-1m
        path: /tmp/rec_tool/dataset/movielens
        transformations: rec_tool.transformations.movielens.prepare_data
      goodreads:
        use: true
        type: goodreads
        transformations: rec_tool.transformations.goodreads.get_goodreads_data
      bookcrossing:
        use: false
        type: bookcrossing
        transformations: rec_tool.transformations.bookcrossing.bookcrossing_converting
      behance:
        use: false
        type: behance
        transformation: rec_tool.transformations.behance.behance_converting
  model:
    model: [rec_tool.models.ncf.ncf_model]
    loss: rec_tool.losses.mean_squared_error
    metrics: [
      rec_tool.metrics.accuracy,
      rec_tool.metrics.rmse,
      rec_tool.metrics.mae
    ]
    batch_size: [1024]
    epoch: [50]
    optimizers: adam
    grid_search: True
    learning_rate: 0.01


  result:
    model: /tmp/rec_tool/model/
    log: /tmp/rec_tool/log/
    results_csv: run_results.csv
    log_to_ml_flow: True
    clear: true

Collaborative Filtering models description

Model and paper Examples
Variational Autoencoder for Collaborative Filtering (VAECF) vae.py
Singular Value Decomposition (SVD) svd.py
Matrix Factorization (MF) mf.py
Multi-Layer Perceptron (MLP) mlp.py
Neural Matrix Factorization (NeuMF) / Neural Collaborative Filtering (NCF) ncf.py
Bayesian Personalized Ranking (BPR) bpr.py
Weighted Matrix Factorization (WMF) wmf.py
SVD++ svdpp.py