This task benchmarks recommendation with implicit feedback on the MovieLens 20 Million (ml-20m) dataset with a Neural Collaborative Filtering model. The model trains on binary information about whether or not a user interacted with a specific item.
Ubuntu 18.04, python 3.5, MXNet 1.2.0, Cuda v9.0.176
-
Install MXNet(CPU or GPU)
-
Install
unzip
andcurl
sudo apt-get install unzip curl
- Checkout the johnsonkee repo
git clone http://github.com/johnsonkee/recommend.git
- Install other python packages
cd recommend
pip install -r requirements.txt
- Checkout the johnsonkee repo
git clone http://github.com/johnsonkee/recommend.git
- Install CUDA and Docker
source reference/install_cuda_docker.sh
- Get the docker image for the recommendation task
# Pull from Docker Hub
docker pull mxnet/python:1.2.0_gpu_cuda9
You can download and verify the dataset by running the download_dataset.sh
and verify_dataset.sh
scripts in the parent directory. Before running the following codes, make sure you are in recommend
directory:
# Creates ml-20.zip
bash download_dataset.sh
# Confirms the MD5 checksum of ml-20.zip
bash verify_dataset.sh
After pulling the image mxnet/python:1.2.0_gpu_cuda9
, you can continue the following codes.
- Build a container through the image
nvidia-docker run --name johnsonkee_mxnet -ti \
mxnet/python:1.2.0_gpu_cuda9 /bin/bash
- Install
unzip
andcurl
apt install unzip curl
- Build a directory to start your workers
cd /home
- Checkout the johnsonkee repo
git clone http://github.com/johnsonkee/recommend.git
- Install other python packages
pip install -r recommend/requirements.txt
- Download and verify dateset
# Creates ml-20.zip
cd recommend
bash download_dataset.sh
# Confirms the MD5 checksum of ml-20.zip
bash verify_dataset.sh
Run the run_and_time.sh
script with an integer seed value between 1 and 5
bash run_and_time.sh SEED
Run the run_and_time.sh
script with an integer seed value between 1 and 5
# make sure you are in the `recommend` directory
bash run_and_time.sh SEED
Harper, F. M. & Konstan, J. A. (2015), 'The MovieLens Datasets: History and Context', ACM Trans. Interact. Intell. Syst. 5(4), 19:1--19:19.
- Unzip
- Remove users with less than 20 reviews
- Create training and test data separation described below
Positive training examples are all but the last item each user rated. Negative training examples are randomly selected from the unrated items for each user.
The last item each user rated is used as a positive example in the test set. A fixed set of 999 unrated items are also selected to calculate hit rate at 10 for predicting the test item.
Data is traversed randomly with 4 negative examples selected on average for every positive example.
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu and Tat-Seng Chua (2017). Neural Collaborative Filtering. In Proceedings of WWW '17, Perth, Australia, April 03-07, 2017.
The author's original code is available at hexiangnan/neural_collaborative_filtering.
Hit rate at 10 (HR@10) with 999 negative items.
HR@10: 0.6289
After every epoch through the training data.
Every users last item rated, i.e. all held out positive examples.
This project was rewritten from mlperf'recommendation by Xianzhuo Wang when he was an intern a Cambricon.
The major difference between the two is that the original one uses PyTorch
as framework while the new one uses MXNet
as framework. In addition, the new one can support for two new datasets:
ml-latest-small
ml-latest
If you have any questiones, contact me 876688461@qq.com or creat an issue.