Machine Learning Practice
Some practices using statistical machine learning technique based on some dataset.
To see more detail or example about deep learning, you can checkout my Deep Learning repository.
Because Github don't support LaTeX for now, you can use the Google Chrome extension TeX All the Things (github) to read the notes.
Environment
- Using Python 3
(most of the relative path links are according to the repository root)
Dependencies
numpy
: For low-level math operationspandas
: For data manipulationsklearn
- Scikit Learn: For evaluation metrics, some data preprocessing
For comparison purpose
sklearn
: For machine learning modelscvxopt
: For convex optimization problem (for SVM)- For gradient boosting
For visualization
Mlxtend
matplotlib
matplotlib.pyplot
mpl_toolkits.mplot3d
For evaluation
surprise
: A Python scikit building and analyzing recommender systems
NLP related
gensim
: Topic Modellinghmmlearn
: Hidden Markov Models in Python, with scikit-learn like APIjieba
: Chinese text segementation librarypyHanLP
: Chinese NLP library (Python API)nltk
: Natural Language Toolkit
Projects
Machine Learning Categories
Consider the learning task
- Surpervised Learning
- Classification - Discrete
- Regression - Continuous
- Unsupervised Learning
- Clustering - Discrete
- Dimensionality Reduction - Continuous
- Association Rule Learning
- Semi-supervised Learning
- Semi-Clustering
- Semi-Classification
- Reinforcement Learning
learning model
Consider the- Discriminative Model
- Discriminative Function
- Probabilistic Discriminative Model
- Generative Model
Cosider the desired output of a ML system
- Classification
Logistic Regression
(optimization algo.)k-Nearest Neighbors (kNN)
Support Vector Machine (SVM)
- Derivation (optimization algo.)Naive Bayes
Decision Tree (ID3, C4.5, CART)
- Regression
Linear Regression
- Derivation (optimization algo.)Tree (CART)
- Clustering
k-Means
Hierarchical Clustering
DBSCAN
- Association Rule Learning
- Dimensionality Reduction
Principal Compnent Analysis (PCA)
Single Value Decomposition (SVD)
- LSA, LSI, Recommendation SystemCanonical Correlation Analysis (CCA)
Isomap
(nonlinear)Locally Linear Embedding (LLE)
(nonlinear)Laplancian Eigenmaps
(nonlinear)
Ensemble Method (Meta-algorithm)
- Bagging
Random Forests
- Boosting
AdaBoost
<- With some basic boosting notesGradient Boosting
Gradient Boosting Decision Tree (GBDT)
(aka. Multiple Additive Regression Tree (MART))
XGBoost
LightGBM
NLP Related
Hidden Markov Model (HMM)
- Sequencial Labeling ProblemConditional Random Field (CRF)
- Classification Problem (e.g. Sentiment Analysis)
Backbone
Maximum Entropy Model (MEM)
Bayesian Network
(aka. Probabilistic Directed Acyclic Graphical Model)
Others
Probabilistic Latent Semantic Analysis (PLSA)
Latent Dirichlet Allocation (LDA)
Vector Space Model (VSM)
Radial Basic Function (RBF) Network
Isolation Forest
One-Class SVM
Heuristic Algorithm (Optimization Method)
Machine Learning Concepts
General Case
Categorized
- Classification
- Data Preprocessing
- Real-world Problem
- Evaluation Metrics
- Binary to Multi-class Expension
- Regression
- Evaluation Metrics
- Clustering
- Evaluation Metrics
Specific Field
- Data Mining - Knowledge Discovering
- Feature Engineering
- Training optimization
- Memory usage
- Evaluation time complexity
- Training optimization
- Recommendation System
- Collaborative Filtering (CF)
- Information Retrieval - Topic Modelling
- Latent Semantic Analysis (LSA/LSI/SVD)
- Latent Dirichlet Allocation (LDA)
- Random Projections (RP)
- Hierarchical Dirichlet Process (HDP)
- word2vec
Machine Learning Mathematics
Topic
- Kernel Usages
- Convex Optimization
- Distance/Similarity Measurement - basis of clustering and recommendation system
Categories
- Linear Algebra
- Orthogonality
- Eigenvalues
- Hessian Matrix
- Quadratic Form
- Markov Chain - HMM
- Calculus
- Multivariable Deratives
- Quadratic Approximations
- Lagrange Multipliers and Constrained Optimization - SVM SMO
- Lagrange Duality
- Multivariable Deratives
- Probability and Statistics
- Statistical Estimation
Basics
- Algebra
- Trigonometry
Application
(from A to Z)
- Decision Tree
- Entropy
- HMM
- Markov Chain
- Naive Bayes
- Bayes' Theorem
- PCA
- Orthogonal Transformations
- Eigenvalues
- SVD
- Eigenvalues
- SVM
- Convex Optimization
- Constrained Optimization
- Lagrange Multipliers
- Kernel
Books Recommendation
Machine Learning
- Machine Learning in Action
- 統計學習方法 (李航)
- 機器學習 (周志華) (alias 西瓜書)
- Python Machine Learning
- Introduction to Machine Learning 3rd
- Solution Manual
- Previous version: 1st, 2nd
- Automated Machine Learning: Methods, Systems, Challenges (AutoML)
Mathematics
- Linear Algebra with Applications (Steven Leon)
- Convex Optimization (Stephen Boyd & Lieven Vandenberghe)
- Numerical Linear Algebra (L. Trefethen & D. Bau III)
Resources
Tutorial
Videos
- Google - Machine Learning Recipes with Josh Gordon
- Youtube - Machine Learning Fun and Easy
- Siraj Raval - The Math of Intelligence
- bilibili - 機器學習 - 白板推導系列
- bilibili - 機器學習升級版
Documentations
Interactive Learning
- Google Machine Learning Crash Course
- Learn with Google AI
- Kaggle Learn Machine Learning
- Microsoft Professional Program - Artificial Intelligence track
- Intel AI Developer Program - AI Courses
MOOC
Github
- Machine Learning from Scratch (eriklindernoren/ML-From-Scratch)
- Avik-Jain/100-Days-Of-ML-Code - 100 Days of ML Coding
- ddbourgin/numpy-ml - Machine learning, in numpy
- Machine learning Resources
- microsoft/recommenders - Best Practices on Recommendation Systems
- dformoso/machine-learning-mindmap
Textbook Implementation
- Machine Learning in Action
- Learning From Data (林軒田)
- 統計學習方法 (李航)
- Stanford Andrew Ng CS229
- NTU Hung-Yi Lee
Datasets
- UCI Machine Learning Repository
- Awesome Public Datasets
- Kaggle Datasets
- The MNIST Database of handwritten digits
- 資料集平台 Data Market
- AI Challenger Datasets
- Peking University Open Research Data
- Open Images Dataset
- Alibaba Cloud Tianchi Data Lab
- biendata
Competition
Global
Taiwan
China
Machine Learning Platform
Machine Learning Tool
- AutoML
- Optuna - A hyperparameter optimization framework
- Hyperopt - Distributed Asynchronous Hyper-parameter Optimization
(Online) Development Environment
- Extension plugin -
pip install jupyter_contrib_nbextensions
- VIM binding
- Codefolding
- ExecuteTime
- Notify
- Jupyter Theme -
pip install --upgrade jupyterthemes