/datacamp_notebooks

Notes of Datacamp courses.

Primary LanguageJupyter Notebook

Datacamp notes

This repo documents all the notes I've taken going through datacamp courses. I'll also jot down some random project ideas.

Completed courses:

  • Parallel computing with Dask
  • Supervised Learning with Scikit-learn
  • Unsupervised Learning in Python
  • Machine Learning with the Experts: School Budgets
  • Introduction to PySpark
  • Deep Larning in Python
  • Introduction to Time Series Analysis in Python

Scikit-learn supervised learning (Classification, Regression) basic workflow

  • Preprocessing data: fill/drop/impute missing data
  • Standarlized, normalized, scaled features
  • Train_test_split the dataset (test_size, random_state)
  • Cross-validation (CV)
  • Hyperparameter tuning (GridSearch CV, randomized CV)
  • model.fit(X_train,y_train)
  • model.predict(X_test)
  • Evaluate performance (R2, F1, and score etc.)
  • Put everything together using Pipeline.