dlab-berkeley/Python-Machine-Learning

Part 2. Regression

stemlock opened this issue · 1 comments

  1. Imputation for Categorical Variables: np.unique() in the imputation section could be confusing compared to the previous output where the NaNs are in a dataframe. Maybe consider converting cp_imp back to a pandas dataframe to show the difference between the two after imputation.
  2. Dummy Encoding: I believe dummy encoding can be done by passing in "drop='first'" as an argument in sklearn.OneHotEncoder object. This should remove the need to create a separate DummyEncoding class.
  3. ColumnTransformer: Spelling mistakes in "ColumntTransformer for Combined Preprocessing" opening description -> "ColumntTransformer" should be "ColumnTransformer", "differntially" should be "differentially"
  4. Transform the test Data: Spelling mistake after data is saved -> "...everything else is just a matter of choosing your mdoel..." should be "model"
  5. GLM Ridge Regression: Spelling mistake in opening description -> "Ridge regression takes a hyerparameter..." should be "hyperparameter"
  6. GLM Ridge Regression: "Leave One Out Cross Validation" (LOOCV) is not explained. A "see more" link might be useful.
  7. Non-Linear Models: Might be helpful to include a quick explainer comparing linear vs non-linear models and pros/cons. Currently they are introduced without explanation.

Closed by #39