/advanced_training

Advanced Scikit-learn training session

Primary LanguageJupyter NotebookBSD 2-Clause "Simplified" LicenseBSD-2-Clause

advanced_training

Advanced Scikit-learn training session

Outline

1 Basic algorithms

  • Review of supervised learning
  • Linear models for classification and regression
  • Loss functions, regularization, empirical risk minimization
  • Path algorithms
  • Exercise: FIXME Regression

2 Basic tools

  • Cross-validation vs train/test split
  • GridSearchCV
  • Overfitting Parameters
  • Scoring Metrics
  • Exercise: FIXME

3 Preprocessing

  • Scaling and normalization

  • Feature selection:

    • Univariate
    • Model-based
    • RFE
    • Forward / backward selection
  • Polynomial and interaction features

  • Exercise: FIXME

4 Advanced tools

  • Pipelines
  • FeatureUnion
  • Function Transformer?
  • Exercise: FIXME

5 Advanced Supervised Learning

  • Decision Tree Recap
  • Random Forests
  • Gradient Boosting / xgboost
  • Kernel SVMs
  • Kernel approximation
  • Neural Networks
  • Exercise: FIXME

6 Unsupervised feature extraction and visualization

  • PCA
  • NMF
  • Robust PCA?
  • TSNE
  • Exercise: FIXME

7 Outlier Detection

  • Elliptic Envelope?
  • IForest ?
  • What else?
  • KDE?
  • SVM?
  • robust PCA?
  • Exercise: FIXME

8 Gaussian Processes

  • Non-iid data
  • Gaussian fit...
  • Covariance matrix is a kernel
  • regression, outlier detection, time series modelling
  • Exercise: FIXME

9 More Neural Networks

10 beyond standard sklearn

  • warm starts
  • out of core
  • custom estimators