mllite/caret2sql

Implementation Process / Evolution

antoinecarme opened this issue · 12 comments

The goal of this issue is to implement SQL generation for the building blocks of any caret model:

  1. Base classification models (GLMxx , naive bayes, decision trees, SVMs , Neural Nets)
  2. Regressions (almost he same as above , except naive bayes)
  3. Preprocessings : "center", "scale", "pca"
  4. Ensembles : Boosting, Bagging, Random Forests, XGBoost.

These lists are expected to cover the most used models in the daily life of a data scientist. The initial guess comes from the original caret paper (http://www.jstatsoft.org/article/view/v028i05/v28i05.pdf), on page 9.

Deliverables :

  1. create a separate github issue for each element of the four lists.
  2. Implementation => jupyter R notebooks
  3. Tests following the process defined in the issue #1
  4. Keep track of the evolution of these sub-issues in the comments of this issue.

Closed the issue : Implementation Process - xgboost methods #9

Closed the issue : Implementation Process - rpart method #6

Closed the issue : Implementation Process - glmnet method #4

Closed the issue : naive_bayes method #5

Closed the issue : Implementation Process - nnet method #8

Closed the issue : Implementation Process - Data Preprocessing - center + scale method #10

Closed the issue : Implementation Process - Data Preprocessing - PCA method #11

Closed the issue : Implementation Process - svmRadial method (and other SVMxx) #7

Closed the issue : Implementation Process - Data Preprocessing - ICA method #12

Closed the issue : Implementation Process - Caret Pipeline Models #13

Closed the issue : Implementation Process - rf method #14

Closed the issue : Implementation Process - ctree method #15