This was done as part of datacamp course : https://campus.datacamp.com/courses/supervised-learning-with-scikit-learn
This course gave me a solid confidence to apply Machine Learning to datasets. All the .ipynb files uploaded in my git are 'ready to run'. I am uploading these for my reference. These exercises may not make much sense to you unless you take the above mentioned datacamp course.
Modules are in following order
- Supervised Learning, Exploratory Data Analysis, k- Nearest Neighbors, Measuring Model Performance, Train Test Split, Fit Predict Accuracy, Overfitting and underfitting
- The basics of Linear Regression, Cross - Validation, K- fold CV comparison, Regularized Regression, Lasso, Ridge
- Logistic Regression and LOC curve, Precision-Recall Curve, Area Under the ROC curve, AUC Computation, Hyperparameter Tuning, GridSearchCV, RandomizedSearchCV, Hold-out set
- Preprocessing data, Creating Dummy Variables, Regression with Categorical features, Handling missing data, Imputing missing data in ML pipeline, Centering and Scaling, Pipeline for Classification, Pipeline for Regression