/giggle

Toy recommender system on jokes

Primary LanguageJupyter Notebook

A recommender system for jokes based on the Jester dataset.

Setup

Run the following commands to create a virtual environment, install the requirements and generate a command line interface, giggle:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python setup.py install

Populate database

Download the data:

mkdir -p data/jester && cd data/jester
for f in jester_ratings jester_items; do
    wget http://www.seas.harvard.edu/courses/cs281/data/${f}.tar.gz
    tar xvzf ${f}.tar.gz
done
cd -

Create a new user and the database:

sudo -u postgres createuser -s jester
createdb -U jester jester_db

Set up environment variables, the URL to the database and a secret key required by Flask:

export DATABASE_URL="postgresql://jester@localhost/jester_db"
export SECRET_KEY=???

Run the script to populate the database:

python -m giggle.models --todo init

Usage

The command line interface, giggle, exposes three sub-commands (see the next section for more details and examples):

  • train: Trains a predictive model
  • evaluate: Creates a report with the performance of the current model
  • web: Starts an web service that can be used for prediction

You can get more information about what arguments each sub-command accepts by running the help command:

giggle train --help
giggle evaluate --help
giggle web --help

Details and examples

Below are some examples for the three sub-commands mentioned above.

  • Evaluates a neighbourhood-based recommender algorithm, neigh, on the large setting of the dataset using K-fold cross-validation:
giggle evaluate -d large -r neigh

The command will print at the standard output a report consisting of the metric (root mean squared error) for the three folds and its mean and standard error values. Here is the ouptput of running the previous command:

 0 4.4660
 1 4.4695
 2 4.4740
--------------
4.4699 ± 0.002
  • Trains a neighbourhood-based recommender algorithm, neigh, on the entire large dataset:
giggle train -d large -r neigh -v
  • Starts a web server using the neighbourhood-based recommender algorithm, neigh:
RECOMMENDER=neigh giggle web -v

In order to check that the web-service is running properly, you can use this script. Here are some examples:

python examples/web_service_test.py predict -u 21
python examples/web_service_test.py add -u 21 -j 17 -r 7.3

Development

In order to have the code-base standardized and project standardized, I have tried:

  • to keep the code PEP8 compliant:
find examples giggle scripts -name '*py' | xargs pep8 --ignore E501
mypy --fast-parser --incremental -m giggle
pytest tests -v