/ml_resources

A selection of machine learning resources

Machine Learning Resources

A selection of machine learning resources

AI as a product

AI as a product

Libraries

Numpy

Pandas

Introduction to pandas

Matplotlib

Introduction to matplotlib

Books online (or free downloads)

Topics

Feature Engineering

Feature Engineering

Clustering

Uses

  • Additionally Vincent Warmerdam sees the possibility to use an auto encoder and cluster the latent space. This gets outlyer detection and sampling in the latent space for free
Resource Comment
Youtube: 4 Basic Types of Cluster Analysis used in Data Analytics (Decisive Data) TBD

Metrics for clustering

Resource Comment
Youtube: Assessing the quality of a clustering (Christian Hennig @ PyData) TBD
Library Comment
sklearn.metrics.homogeneity_score TBD
sklearn.metrics.completeness_score TBD
sklearn.metrics.v_measure_score TBD
sklearn.metrics.adjusted_rand_score TBD
average silhouette width (ASW), TBD is this sklearn.metrics.silhouette_score TBD

K Means Clustering

Resource Comment
Youtube: K Means Clustering (Siraj Raval) TBD
Youtube: K Means Clustering (StatQuest) TBD
Youtube: K Means Clustering (Andrew Ng) TBD
Youtube: K Means Clustering (Luis Serrano) Hierarchical Clustering too
Youtube: K Means Clustering (Batool Arhamna Haider) Within Cluster and Between Cluster Distances
TowardsDataScience: Understanding K-Means Also has K-Medoids
Library Comment
sklearn.cluster.KMeans Standard K Means Clustering (Can initialize random or k-means++)
sklearn.cluster.MiniBatchKMeans Mini-Batch K-Means clustering
Questions

Hierarchical Clustering

Resource Comment
Youtube: Hierarchical Clustering (Luis Serrano) K Means Clustering too
Youtube: Heriarchical Clustering (StatQuest) TBD

Density Based Clustering (TBD)

DBSCAN has typically two parameters: eps and minPoints

  • eps tells how close data should be to be within the cluster
  • minPoints says how many points are needed to form a cluster
Resource Comment
Youtube: Density Based Clustering (Brian Kent @ PyData) TBD
Youtube: HDBSCAN (John Healy @ PyData) TBD
Library Comment
sklearn.cluster.DBSCAN TBD
scikit-learn-contrib.hdbscan Hierarchical density based clustering
debacl Uses Level Set Trees

Gaussian Mixture Model (TBD)

Clusters are described by Gaussian distributions

Classification

Resource Comment
Hands on machine learning (Aurélien Geron): Classification notebook MNIST, Precision, Recall, Confusion Matrix, ROC

Pipelines

Resource Comment
Youtube: Deploying Machine Learning using sklearn pipelines (Kevin Goetsch @ PyData) TBD
Youtube: Pandas, Pipelines, and Custom Transformers (Julie Michelman @ PyData) TBD
Youtube: How you SHOULD code Machine Learning (CodeEmporium) TBD
Youtube: Creating Pipelines Using SKlearn- Machine Learning Tutorial (Krish Naik) TBD
Youtube: Use cross_val_score and GridSearchCV on a Pipeline (Data School) TBD
Github: Pandas in sklearn TBD
Github: sklearn transformers (by oegedijk) TBD
Gist:Transformer Howto TBD
Blog: A simple guide to pipelines (by Rebecca Vickery) TBD
Blog: Step by step tutorial on pipelines (by James Ho) TBD