美团点评技术团队:Online Learning算法理论与实践.
-
FTRL: Follow-The-Regularised-Leader proximal (FTRL-proximal)
[McMahan et al., 2013] McMahan, H. B., Holt, G., Sculley, D., Young, M., Ebner, D., Grady, J., Nie, L., Phillips, T., Davydov, E., Golovin, D., et al. (2013). Ad click prediction: a view from the trenches. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1222–1230. ACM.
-
BPR: Bayesian Probit Regression
[Graepel et al., 2010] Graepel, T., Candela, J. Q., Borchert, T., and Herbrich, R. (2010). Web-scale bayesian click-through rate prediction for sponsored search advertising in microsoft’s bing search engine. In Proceedings of the 27th International Conference on Machine Learning, pages 13–20.
计算广告与机器学习:第09章:深入浅出ML之Factorization家族.
周志华《机器学习》:第8章 集成学习
- Bagging
- GBDT
- 李航《统计学习方法》:5.5 CART算法
- Blog:GBDT:梯度提升决策树.
- Blog: GBDT(MART) 迭代决策树入门教程 | 简介.
- Hybrid Models
- [He et al., 2014] He, X., Pan, J., Jin, O., Xu, T., Liu, B., Xu, T., Shi, Y., Atallah, A., Herbrich, R., Bowers, S., et al. (2014). Practical lessons from predicting clicks on ads at facebook. In Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, pages 1–ACM.
- [Juan et al., 2016] Juan, Y., Zhuang, Y., Chin, W.-S., and Lin, C.-J. (2016). Field-aware factorization machines for CTR prediction. In Proceedings of the 9th ACM Conference on Recommender Systems.
-
Factorisation Machine supported Neural Network (FNN) and Sampling-based Neural Network (SNN)
-
Alibaba
-
Tencent
From Display Advertising with Real-Time Bidding (RTB) and Behavioural Targeting by Jun Wang, Weinan Zhang and Shuai Yuan. ArXiv 2016.
As discussed in this chapter, there are various user response prediction mod-els. From the modeling perspective, these models can be generally cate-gorised as linear and non-linear models.
Linear models, including logistic regression [Lee et al., 2012, McMahanet al., 2013] and Bayesian probit regression (with diagonal covariance matrix)[Graepel et al., 2010], directly build the model based on the feature independence assumption. For linear models, the feature interaction patterns are generally captured by building large-scale feature space with combining multi-field features, which could consume much human effort. However, thanks to its high efficiency and high parallelization capability, linear models are able to be fed in much more training data instances (and higher dimensional features) during the same training period, which makes them still highly comparable with the non-linear models in many industrial environments.
Non-linear models, including factorisation machine [Rendle, 2010, Juanet al., 2016], tree models [He et al., 2014] and recently emerged (deep) neural networks models [Zhang et al., 2016b, Qu et al., 2016], provide model capacity of automatically learning feature interaction patterns without the need of designing combining features. These non-linear models generally need much more computation resources than the linear ones, and some of themmay require multiple stages of model training, as demostrated in [He et al.,2014]. With the fast development high performance computing (HPC) andthe explosion of data volume, non-linear models are more and more appliedin commercial platforms for practical user response prediction.
知乎收藏夹:RTB.
知乎收藏夹:Research.