A recommender system for jokes based on the Jester dataset.
Run the following commands to create a virtual environment, install the requirements and generate a command line interface, giggle
:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python setup.py install
Download the data:
mkdir -p data/jester && cd data/jester
for f in jester_ratings jester_items; do
wget http://www.seas.harvard.edu/courses/cs281/data/${f}.tar.gz
tar xvzf ${f}.tar.gz
done
cd -
Create a new user and the database:
sudo -u postgres createuser -s jester
createdb -U jester jester_db
Set up environment variables, the URL to the database and a secret key required by Flask:
export DATABASE_URL="postgresql://jester@localhost/jester_db"
export SECRET_KEY=???
Run the script to populate the database:
python -m giggle.models --todo init
The command line interface, giggle
, exposes three sub-commands (see the next section for more details and examples):
train
: Trains a predictive modelevaluate
: Creates a report with the performance of the current modelweb
: Starts an web service that can be used for prediction
You can get more information about what arguments each sub-command accepts by running the help command:
giggle train --help
giggle evaluate --help
giggle web --help
Below are some examples for the three sub-commands mentioned above.
- Evaluates a neighbourhood-based recommender algorithm,
neigh
, on thelarge
setting of the dataset using K-fold cross-validation:
giggle evaluate -d large -r neigh
The command will print at the standard output a report consisting of the metric (root mean squared error) for the three folds and its mean and standard error values. Here is the ouptput of running the previous command:
0 4.4660
1 4.4695
2 4.4740
--------------
4.4699 ± 0.002
- Trains a neighbourhood-based recommender algorithm,
neigh
, on the entirelarge
dataset:
giggle train -d large -r neigh -v
- Starts a web server using the neighbourhood-based recommender algorithm,
neigh
:
RECOMMENDER=neigh giggle web -v
In order to check that the web-service is running properly, you can use this script. Here are some examples:
python examples/web_service_test.py predict -u 21
python examples/web_service_test.py add -u 21 -j 17 -r 7.3
In order to have the code-base standardized and project standardized, I have tried:
- to keep the code PEP8 compliant:
find examples giggle scripts -name '*py' | xargs pep8 --ignore E501
- to add type annotations and make sure
mypy
accepts it:
mypy --fast-parser --incremental -m giggle
- to write tests using py.test:
pytest tests -v
- to keep a list of things to do
- to keep a list of ideas and resources