/DeePray

Deep Pray(深度祈祷):An new Modular, Scalable, Configurable, Easy-to-Use and Extend infrastructure for Deep Learning based Classification.

Primary LanguagePythonApache License 2.0Apache-2.0

DeePray (深度祈祷): A new Modular, Scalable, Configurable, Easy-to-Use and Extend infrastructure for Deep Learning based Recommendation.

Documentation Status PyPI version GitHub version

Introduction

The DeePray library offers state-of-the-art algorithms for [deep learning recommendation]. DeePray is built on latest [TensorFlow 2][(https://tensorflow.org/)] and designed with modular structure, making it easy to discover patterns and answer questions about tabular-structed data.

The main goals of DeePray:

  • Easy to use, newbies can get hands dirty with deep learning quickly
  • Good performance with web-scale data
  • Easy to extend, Modular architecture let you build your Neural network like playing LEGO!

Let's Get Started! Please refer to the official docs at https://deepray.readthedocs.io/en/latest/.

Installation

Install DeePray using PyPI:

To install DeePray library from PyPI using pip, execute the following command:

pip install deepray

Install DeePray from Github source:

First, clone the DeePray repository using git:

git clone https://github.com/fuhailin/deepray.git

Then, cd to the deepray folder, and install the library by executing the following commands:

cd deepray
pip install .

Tutorial

Census Adult Data Set

Data preparation

In your tabular data, specify NUMERICAL for your continue features, CATEGORY for categorical features, VARIABLE for variable length features, and obviously LABEL for label column. Then process them to to TFRecord format into order to get good performance with large-scale dataset.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler, LabelEncoder

from deepray.utils.converter import CSV2TFRecord


# http://archive.ics.uci.edu/ml/datasets/Adult
train_data = 'DeePray/examples/census/data/raw_data/adult_data.csv'
df = pd.read_csv(train_data)
df['income_label'] = (df["income_bracket"].apply(lambda x: ">50K" in x)).astype(int)
df.pop('income_bracket')

NUMERICAL_FEATURES = ['age', 'fnlwgt', 'hours_per_week', 'capital_gain', 'capital_loss', 'education_num']
CATEGORY_FEATURES = [col for col in df.columns if col != LABEL and col not in NUMERICAL_FEATURES]
LABEL = ['income_label']

for feat in CATEGORY_FEATURES:
    lbe = LabelEncoder()
    df[feat] = lbe.fit_transform(df[feat])
# Feature normilization
mms = MinMaxScaler(feature_range=(0, 1))
df[NUMERICAL_FEATURES] = mms.fit_transform(df[NUMERICAL_FEATURES])


prebatch = 1  # flags.prebatch
converter = CSV2TFRecord(LABEL, NUMERICAL_FEATURES, CATEGORY_FEATURES, VARIABLE_FEATURES=[], gzip=False)
converter.write_feature_map(df, './data/feature_map.csv')

train_df, valid_df = train_test_split(df, test_size=0.2)
converter(train_df, out_file='./data/train.tfrecord', prebatch=prebatch)
converter(valid_df, out_file='./data/valid.tfrecord', prebatch=prebatch)

You will get a feature map file like that:

9,workclass,CATEGORICAL
16,education,CATEGORICAL
7,marital_status,CATEGORICAL
15,occupation,CATEGORICAL
6,relationship,CATEGORICAL
5,race,CATEGORICAL
2,gender,CATEGORICAL
42,native_country,CATEGORICAL
1,hours_per_week,NUMERICAL
1,capital_gain,NUMERICAL
1,age,NUMERICAL
1,fnlwgt,NUMERICAL
1,capital_loss,NUMERICAL
1,education_num,NUMERICAL
2,income_label,LABEL

Choose your model, Training and evaluation

"""
build and train model
"""

import sys

from absl import app, flags

import deepray as dp
from deepray.base.trainer import train
from deepray.model.build_model import BuildModel

FLAGS = flags.FLAGS


def main(flags=None):
    FLAGS(flags, known_only=True)
    flags = FLAGS
    model = BuildModel(flags)
    history = train(model)
    print(history)


argv = [
    sys.argv[0],
    '--model=lr',
    '--train_data=./census/data/train.tfrecord',
    '--valid_data=./census/data/valid.tfrecord',
    '--feature_map=./census/data/feature_map.csv',
    '--learning_rate=0.01',
    '--epochs=10',
    '--batch_size=64',
]
main(flags=argv)

Models List

Titile Booktitle Resources
FM: Factorization Machines ICDM'2010 [pdf] [code]
FFM: Field-aware Factorization Machines for CTR Prediction RecSys'2016 [pdf] [code]
FNN: Deep Learning over Multi-field Categorical Data: A Case Study on User Response Prediction ECIR'2016 [pdf][code]
PNN: Product-based Neural Networks for User Response Prediction ICDM'2016 [pdf][code]
Wide&Deep: Wide & Deep Learning for Recommender Systems DLRS'2016 [pdf][code]
AFM: Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks IJCAI'2017 [pdf][code]
NFM: Neural Factorization Machines for Sparse Predictive Analytics SIGIR'2017 [pdf][code]
DeepFM: DeepFM: A Factorization-Machine based Neural Network for CTR Prediction[C] IJCAI'2017 [pdf] [code]
DCN: Deep & Cross Network for Ad Click Predictions ADKDD'2017 [pdf] [code]
xDeepFM: xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems KDD'2018 [pdf] [code]
DIN: DIN: Deep Interest Network for Click-Through Rate Prediction KDD'2018 [pdf] [code]
DIEN: DIEN: Deep Interest Evolution Network for Click-Through Rate Prediction AAAI'2019 [pdf] [code]
DSIN: Deep Session Interest Network for Click-Through Rate Prediction IJCAI'2019 [pdf][code]
AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks CIKM'2019 [pdf][code]
FLEN: Leveraging Field for Scalable CTR Prediction AAAI'2020 [pdf][code]
DFN: Deep Feedback Network for Recommendation IJCAI'2020 [pdf][code]

How to build your own model with DeePray

Inheriting BaseCTRModel class from from deepray.model.model_ctr, and implement your own build_network() method!

Contribution

DeePray is still under development, and call for contributions!

* Hailin Fu (`Hailin <https://github.com/fuhailin>`)
* Call for contributions!

让DeePray成为推荐算法新基建需要你的贡献

Citing

DeePray is designed, developed and supported by Hailin. If you use any part of this library in your research, please cite it using the following BibTex entry

@misc{DeePray,
  author = {Hailin Fu},
  title = {DeePray: A new Modular, Scalable, Configurable, Easy-to-Use and Extend infrastructure for Deep Learning based Recommendation},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/fuhailin/deepray}},
}

License

Copyright (c) Copyright © 2020 The DeePray Authors. All Rights Reserved.

Licensed under the Apach License.

Reference

https://github.com/shenweichen/DeepCTR

https://github.com/aimetrics/jarvis

https://github.com/shichence/AutoInt

Contact

If you want cooperation or have any questions, please follow my wechat offical account:

公众微信号ID:【StateOfTheArt】

StateOfTheArt