In online advertising, click-through rate (CTR) is a very important metric for evaluating ad performance. As a result, click prediction systems are essential and widely used for sponsored search and real-time bidding.
- label : 0/1 for non-click/click
- id
- hour => 0600-13:1, 1400-21:2, 2200-05:3
- banner_pos
- site_id
- site_category
- app_id
- app_category
- device_id
- device_ip
- device_model
- device_type
- C1, C14-C21 -- anonymized categorical variables
- Spark 1.6.1
- Scala 2.10.4
- SBT 0.13.8
總共有三種Model: SVM, Logistic Regression, Random Forest。
使用 spark 實作 Recursive Feature Elimination, 找出適當的屬性
Hyperparameter tunning: return model which has the best AUC Area
每個模型的建立過程會先做以投票方式產生Label。