This repo documents all the notes I've taken going through datacamp courses. I'll also jot down some random project ideas.
- Parallel computing with Dask
- Supervised Learning with Scikit-learn
- Unsupervised Learning in Python
- Machine Learning with the Experts: School Budgets
- Introduction to PySpark
- Deep Larning in Python
- Introduction to Time Series Analysis in Python
- Preprocessing data: fill/drop/impute missing data
- Standarlized, normalized, scaled features
- Train_test_split the dataset (test_size, random_state)
- Cross-validation (CV)
- Hyperparameter tuning (GridSearch CV, randomized CV)
- model.fit(X_train,y_train)
- model.predict(X_test)
- Evaluate performance (R2, F1, and score etc.)
- Put everything together using Pipeline.