Table Contents

What is DeepRec
How to Use
Benchmark Results
References

What is DeepRec

DeepRec is a portable, flexible and comprehensive library including a variety of state-of-the-art deep learning based recommendation models. It aims to solve the item ranking task. In current version, DeepRec supports two kinds of methods: feature-based methods and knowledge-enhanced methods. In feature-based methods, deep learning models are applied to the extracted feature files with the specified format. In knowledge-enhanced methods, the signals from knowledge graph are leveraged to improve the recommendation performance. Current supported models are listed in the following, more methods will be expected in the near future.

currently supported models

model	type	data format	configuration example
lr	feature-based	/data_format/libffm_format.md	/config/lr.yaml
fm	feature-based	/data_format/libffm_format.md	/config/fm.yaml
dnn	feature-based	/data_format/libffm_format.md	/config/dnn.yaml
ipnn	feature-based	/data_format/libffm_format.md	/config/ipnn.yaml
opnn	feature-based	/data_format/libffm_format.md	/config/opnn.yaml
deepWide	feature-based	/data_format/libffm_format.md	/config/deepWide.yaml
deepFM	feature-based	/data_format/libffm_format.md	/config/deepFM.yaml
deep&cross	feature-based	/data_format/libffm_format.md	/config/deepcross.yaml
din	feature-based	/data_format/din_format.md	/config/din.yaml
cccfnet	feature-based	/data_format/cccfnet_format.md	/config/cccfnet_classfy.yaml, /config/cccfnet_regress.yaml
dkn	knowledge-enhanced	/data_format/dkn_format.md	/config/dkn.yaml
exDeepFM	feature-based	/data_format/libffm_format.md	/config/exDeepFM.yaml
ripple	knowledge-enhanced	/data_format/ripple_format.md	/config/ripple.yaml
mkr	knowledge-enhanced	/data_format/mkr_format.md	/config/mkr.yaml

Table 1

How to Use

Requirement

Enviroment: linux, python 3
Dependent packages: tensorflow (>=1.7.0), sklearn, yaml, numpy

Usage

For each method, prepare your data as the corresponding format listed in Table 1.
Edit the corresponding configuration file listed in Table 1, to set the parameters for your method, such as training filename, testing filename, and so on. In directory /wiki, we give more explainations about each method's related parameters in the configuration file.
Run this kind of command "python mainArg.py [the choosed model name] train/infer"

Examples

Here, we give the example of running ExDeepFM, more examples can be found here.

Download the data and Prepare the data in the required format (libffm for ExDeepFM). Assume you are at the root directory.
```
cd data
wget http://files.grouplens.org/datasets/movielens/ml-100k.zip
unzip ml-100k.zip
python ML-100K2Libffm.py
```
Edit the corresponding configuration file in /config/exDeepFM.yaml for both training and testing, and then run the following command to override the actually used configuration file.
```
cp config/exDeepFM.yaml config/network.yaml
```
Train the model using the following command. The first argv element ("exDeepFM_Model_1") is the directory name for the results. For example, it will create /cache/exDeepFM_Model_1 directory to save your cache file, /checkpoint/exDeepFM_Model_1 to save your trained model, /logs/exDeepFM_Model_1 to save your training log. The second argv element is about the mode. If you want to train a model, you choose "train". If you want to infer results, you choose "infer".
```
python mainArg.py exDeepFM_Model_1 train
```
Infer the result. Given the trained model in /checkpoint/exDeepFM_Model_1 in step 3, and then run:
```
python mainArg.py exDeepFM_Model_1 infer
```

Benchmark Results

benchmark-1

we sample 300w from criteo dataset(dataset), dealing with long tail features and continuous features. the dataset has 26w features and 300w samples.we split the dataset randomly into three parts: 80% is for training, 10% is for validating, 10% is for testing.

model	auc	logloss	train time per epoch/s
lr	0.7779	0.4692	20.4
fm	0.7895	0.4591	90.8
dnn	0.7939	0.4552	425.1
ipnn	0.7947	0.4546	413.3
opnn	0.7957	0.4539	417.6
deepWide	0.7936	0.4557	412.4
deepFM	0.7944	0.4549	680.8

benchmark-2

we conduct experiment on Company* dataset.the dataset has 20w samples and 19w features.

model	auc	logloss	train time per epoch/s
lr	0.6555	0.3914	21.9
fm	0.6873	0.39	58.4
dnn	0.7315	0.3711	201.7
ipnn	0.7297	0.3712	199.3
opnn	0.7332	0.3698	197.3
deepWide	0.7346	0.3721	202.1
deepFM	0.7324	0.3759	233.6
din	0.7401	0.3763	331.4

Note

DeepRec supports the mulit-hot data type by default, sparse matrix is used to store data.
DeepRec is currently designed only for academic experiments, if the number of samples is larger than 1000w, and feature num is larger than 100w, it may suffer from efficiency issues. We are trying to improve efficiency.

Maggione/DeepRec