/RecAlgorithm

主流推荐系统Rank算法的实现

Primary LanguagePythonBSD 2-Clause "Simplified" LicenseBSD-2-Clause

主流推荐系统Rank算法的实现

Python TensorFlow Versions

项目简介

  • 实现推荐系统中主要使用的Rank算法,并使用公开数据集评测,所有算法均已跑通并完成完整的训练,最终生成saved_modelcheckpointtf-serving部署;
  • 使用微信视频号推荐算法比赛数据集,数据详情请见 ./dataset/README.md
  • 为了贴合工业界使用情况,使用TensorFlow Estimator框架,数据format为Tfrecord
  • 算法实现在./algrithm下,每个算法单独一个文件夹,名字为普遍接受的大写算法名称,训练入口为文件夹下对应的小写算法名称py文件,如DIN文件夹下的din.py文件为训练DIN模型的入口,具体请见末尾的示例部分;
  • 每个算法都实现了自己的model_fn,没有使用Keras高阶API,只使用TensorFlow的中低阶API构造静态图;
  • 算法超参数可由--parameter_name=parameter_value方式传入训练入口脚本,超参数定义请见训练入口脚本tf.app.flags部分;
  • 单任务模型使用数据集因变量中的read_comemnt评测,多任务模型使用read_commet like click_avatar三个任务评测;

单任务Models列表

Model Paper *Best_read_comment_Auc
FFM [2016] Field-aware Factorization Machines for CTR Prediction 0.8911285
DeepCrossing [2016] Deep Crossing - Web-Scale Modeling without Manually Crafted Combinatorial Features 0.9185908
PNN [2016] Product-based neural networks for user response prediction 0.9065931
Wide & Deep [2016] Wide & Deep Learning for Recommender Systems 0.9133482
DeepFM [2017] DeepFM: A Factorization-Machine based Neural Network for CTR Prediction 0.8529998
DCN [2017] Deep & Cross Network for Ad Click Predictions 0.9183242
AFM [2017] Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks 0.9117872
xDeepFM [2018] xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems 0.9152467
FwFM [2018] Field-weighted Factorization Machines for Click-Through Rate Prediction in Display Advertising 0.9118794
DIN [2018] Deep Interest Network for Click-Through Rate Prediction 0.9116896
DIEN [2018] Deep Interest Evolution Network for Click-Through Rate Prediction -
FiBiNet [2019] FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction 0.9149044
BST [2019] Behavior sequence transformer for e-commerce recommendation in Alibaba 0.9165866

*Best_read_comment_Auc为每个model各自调参后的测试集最大Auc,每个model各自的评测见每个model路径下的result.md
*DIEN不适用于微信视频号数据集,故只实现了静态图,并没有评测。

多任务Models列表

Model Paper *Best_read_commet_AUC *Best_like_AUC *Best_click_avatar_AUC
ESMM [2018] Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate - - -
MMOE [2018] Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts 0.91860557 0.8126400 0.8139362
PLE [2020] Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations 0.91965175 0.8136461 0.8154559

*Best_xx_AUC为所有超参数组合中的最高值,横向的三个AUC可能不在同一组超参数中。
*由于ESMM的结构特殊性,不适用于微信视频号数据集,故只实现了静态图,并没有评测。

示例

# 先执行以下命令确保生成了tfrecord
# cd ./dataset/wechat_algo_data1
# python DataGenerator.py && cd ..
cd ./DIN
# 训练时可自定义参数
python din.py --use_softmax=True 

To Do List

  • 增加多任务学习Trick: Uncertainty, GradNorm, PCGrad, etc.
  • 增加AutoInt, FLEN, etc.
  • 重构特征工程部分, 包括配置化输入等, 参考https://github.com/Shicoder/Deep_Rec

欢迎提issue,或直接勾搭

pic