/DeepRec

Deeprec is a portable, flexible and comprehensive library including a variety of state-of-the-art deep learning based recommendation models.

Primary LanguagePython

Table Contents

  • What is DeepRec
  • How to Use
  • Benchmark Results
  • References

What is DeepRec

DeepRec is a portable, flexible and comprehensive library including a variety of state-of-the-art deep learning based recommendation models. It aims to solve the item ranking task. In current version, DeepRec supports two kinds of methods: feature-based methods and knowledge-enhanced methods. In feature-based methods, deep learning models are applied to the extracted feature files with the specified format. In knowledge-enhanced methods, the signals from knowledge graph are leveraged to improve the recommendation performance. Current supported models are listed in the following, more methods will be expected in the near future.

currently supported models

model type data format configuration example
lr feature-based /data_format/libffm_format.md /config/lr.yaml
fm feature-based /data_format/libffm_format.md /config/fm.yaml
dnn feature-based /data_format/libffm_format.md /config/dnn.yaml
ipnn feature-based /data_format/libffm_format.md /config/ipnn.yaml
opnn feature-based /data_format/libffm_format.md /config/opnn.yaml
deepWide feature-based /data_format/libffm_format.md /config/deepWide.yaml
deepFM feature-based /data_format/libffm_format.md /config/deepFM.yaml
deep&cross feature-based /data_format/libffm_format.md /config/deepcross.yaml
din feature-based /data_format/din_format.md /config/din.yaml
cccfnet feature-based /data_format/cccfnet_format.md /config/cccfnet_classfy.yaml, /config/cccfnet_regress.yaml
dkn knowledge-enhanced /data_format/dkn_format.md /config/dkn.yaml
exDeepFM feature-based /data_format/libffm_format.md /config/exDeepFM.yaml
ripple knowledge-enhanced /data_format/ripple_format.md /config/ripple.yaml
mkr knowledge-enhanced /data_format/mkr_format.md /config/mkr.yaml
Table 1

How to Use

Requirement

  • Enviroment: linux, python 3
  • Dependent packages: tensorflow (>=1.7.0), sklearn, yaml, numpy

Usage

  1. For each method, prepare your data as the corresponding format listed in Table 1.
  2. Edit the corresponding configuration file listed in Table 1, to set the parameters for your method, such as training filename, testing filename, and so on. In directory /wiki, we give more explainations about each method's related parameters in the configuration file.
  3. Run this kind of command "python mainArg.py [the choosed model name] train/infer"

Examples

Here, we give the example of running ExDeepFM, more examples can be found here.

  1. Download the data and Prepare the data in the required format (libffm for ExDeepFM). Assume you are at the root directory.
    cd data
    wget http://files.grouplens.org/datasets/movielens/ml-100k.zip
    unzip ml-100k.zip
    python ML-100K2Libffm.py
    
  2. Edit the corresponding configuration file in /config/exDeepFM.yaml for both training and testing, and then run the following command to override the actually used configuration file.
    cp config/exDeepFM.yaml config/network.yaml
    
  3. Train the model using the following command. The first argv element ("exDeepFM_Model_1") is the directory name for the results. For example, it will create /cache/exDeepFM_Model_1 directory to save your cache file, /checkpoint/exDeepFM_Model_1 to save your trained model, /logs/exDeepFM_Model_1 to save your training log. The second argv element is about the mode. If you want to train a model, you choose "train". If you want to infer results, you choose "infer".
    python mainArg.py exDeepFM_Model_1 train
    
  4. Infer the result. Given the trained model in /checkpoint/exDeepFM_Model_1 in step 3, and then run:
    python mainArg.py exDeepFM_Model_1 infer
    

Benchmark Results

benchmark-1

we sample 300w from criteo dataset(dataset), dealing with long tail features and continuous features. the dataset has 26w features and 300w samples.we split the dataset randomly into three parts: 80% is for training, 10% is for validating, 10% is for testing.

model auc logloss train time per epoch/s
lr 0.7779 0.4692 20.4
fm 0.7895 0.4591 90.8
dnn 0.7939 0.4552 425.1
ipnn 0.7947 0.4546 413.3
opnn 0.7957 0.4539 417.6
deepWide 0.7936 0.4557 412.4
deepFM 0.7944 0.4549 680.8

benchmark-2

we conduct experiment on Company* dataset.the dataset has 20w samples and 19w features.

model auc logloss train time per epoch/s
lr 0.6555 0.3914 21.9
fm 0.6873 0.39 58.4
dnn 0.7315 0.3711 201.7
ipnn 0.7297 0.3712 199.3
opnn 0.7332 0.3698 197.3
deepWide 0.7346 0.3721 202.1
deepFM 0.7324 0.3759 233.6
din 0.7401 0.3763 331.4

Note

  1. DeepRec supports the mulit-hot data type by default, sparse matrix is ​​used to store data.
  2. DeepRec is currently designed only for academic experiments, if the number of samples is larger than 1000w, and feature num is larger than 100w, it may suffer from efficiency issues. We are trying to improve efficiency.

References