Course Project for Methods for Big Data Analytics @ Ecole polytechnique
Axa has 28 call centers, all having different behavior in time. Predicting in advance the number of incoming calls is of high importance for the labor management. When making predictions, from a business perspective, over-estimations are less damagable than under-estimations. Indeed, over-estimation means loss on the labor costs, but under-estimation means bad customer experience, and eventually client churn, which is far more costly than labor cost loss.
Hence, we use a dyssimetric loss measure:
$ LinEx(y, \hat {y}) = exp(\alpha (y - \hat {y}) - \alpha (y - \hat {y}) - 1$
with y the true value and
missing values group by for possible duplicates keep only relevant variables
See report
PCA, k-PCA, Lasso
Specific to Time Series (beware to temporal aspects, usual K-Fold CV does not apply)
ARMA (Time Series) Self-Training Lasso
LinEx = 0.82