/ML_ronin

Data Science and Machine Learning for ronins

Primary LanguageJupyter Notebook

This repo contains the code and resources for the 8 weeks course of ML for Ronins.

It consists of 4 weeks of theory and practice and 4 weeks of hands-on projects. This course covers different subjects like Python for Data Science (NumPy, Pandas), Data Preparation, Preprocessing, Data Analysis, Visualization Tools. In addition, the ML algorithms are focused mainly in supervised methods that are commonly employed in practice. Unsupervised learning and cluster analysis methods are also covered.

Week 1

1. Python Introduction

1.1 Session 1

1.1.1 Introduction

1.1.2 Data Types

  • Numbers
  • Booleans
  • Strings

1.1.3 Collections

  • Lists
  • Dicts
  • Tuples
  • Sets

1.1.4 Conditional Statements

1.1.5 Functions and Classes

1.1.6 See also (HW)

  • Iterables and Iterators
  • Global and local variables
  • Intro to OOP in Python

1.2 Exercises 1

  • Write a function that uses different Data Types
  • Loop over an iterable
  • Create a Class with instance attributes

1.3 Session2

1.3.1 Introduction

1.3.2 Read Files

1.3.3 Lambda functions

  • Overview
  • Map function

1.3.4 NumPy

  • Arrays
  • Data Types
  • Mathematical Operations

1.3.5 See also (HW)

  • Filter function
  • Numpy broadcasting
  • Python connection to SQL database

1.4 Exercises 2

  • Read a file and create a list of lines length
  • Convert functions to lambda functions
  • Evaluate NumPy array operations

1.5 Evaluation

1.6 Project

Week 2

2. Data Wrangling

2.1 Session 1

2.1.1 Introduction

2.1.2 Web Scraping

  • Parsing HTML
  • Requests Examples

2.1.3 Pandas Basics

  • DataFrames
  • Series
  • Columns
  • Concatenate and Merge

2.1.4 See also (HW)

  • Web Scraping with Selenium
  • Read files to DFs
  • Reshape and pivoting

2.2 Exercises 1

  • Create a DF from a scraped URL

2.3 Session 2

2.3.1 Introduction

2.3.2 Data Preparation

  • Missing
  • Dtypes
  • Homogeneity
  • Duplicates

2.3.3 Encoding

  • Categorical Encoding
  • One-Hot Encoding
  • Text Representation
  • Feature Scaling

2.3.4 See also (HW)

  • sklearn transformation pipelines
  • Feature Engineering

2.4 Exercises 2

  • Clean a data source

2.5 Evaluation

2.6 Project

Week 3

3. EDA

3.1 Session 1

3.1.1 Introduction

3.1.2 Visualization

  • Matplotlib
  • Seaborn
  • Bokeh

3.1.3 Exploring the data

  • Descriptive Stats
  • Categorical Data

3.1.4 See also (HW)

  • Plotly
  • Tableau
  • Power BI

3.2 Exercises 1

  • Plotting exercise

3.3 Session 2

3.3.1 Introduction

3.3.2 Exploring the data II

  • Continuous data
  • Correlation Bi/Multivariate
  • Mutual Information
  • PCA

3.3.3 See also (HW)

  • t-SNE
  • UMAP

3.4 Exercises 2

  • Perform an EDA on a dataset

3.5 Evaluation

3.6 Project

Week 4

4. Machine Learning

4.1 Session 1

4.1.1 Introduction

4.1.2 Supervised Learning

  • Linear Regression
  • Logistic Regression
  • Neural Networks
  • Random Forest
  • Boosting

4.1.2 Model Selection and Evaluation

  • Hyperparameters Search
  • Performance Metrics
  • Cross-Validation

4.1.3 See also (HW)

  • Naive Bayes
  • Support Vector Machines
  • Deep Neural Networks

4.2 Exercises 1

  • Train a classification model
  • Train a regression model

4.3 Session 2

4.3.1 Introduction

4.3.2 Unsupervised Learning

  • K-means
  • Spectral Clustering
  • Gaussian Mixtures
  • DBSCAN
  • Model Evaluation

4.3.3 Association Rules

  • Introduction
  • Market Basket Analysis

4.3.4 Non-negative Matrix Factorization

4.3.5 Latent Dirichlet Allocation

4.3.6 See also (HW)

  • Agglomerative Clustering
  • BIRCH
  • HDBSCAN
  • SOM (Self-Organizing-Maps)

4.4 Exercises 2

  • Perform a cluster analysis using at least two different methods

4.5 Evaluation

4.6 Project

Week 5-8

ML Project 1: Churn Prediction

ML Project 2: Customer Segmentation