Some practices using statistical machine learning technique based on some dataset.
To see more detail or example about deep learning, you can checkout my Deep Learning repository.
Because Github don't support LaTeX for now, you can use the Google Chrome extension TeX All the Things (github) to read the notes.
- Using Python 3
(most of the relative path links are according to the repository root)
numpy
: For low-level math operationspandas
: For data manipulationsklearn
- Scikit Learn: For evaluation metrics, some data preprocessing
For comparison purpose
sklearn
: For machine learning modelscvxopt
: For convex optimization problem (for SVM)- For gradient boosting
For visualization
Mlxtend
matplotlib
matplotlib.pyplot
mpl_toolkits.mplot3d
For evaluation
surprise
: A Python scikit building and analyzing recommender systems
NLP related
gensim
: Topic Modellinghmmlearn
: Hidden Markov Models in Python, with scikit-learn like APIjieba
: Chinese text segementation librarypyHanLP
: Chinese NLP library (Python API)nltk
: Natural Language Toolkit
- Surpervised Learning
- Classification - Discrete
- Regression - Continuous
- Unsupervised Learning
- Clustering - Discrete
- Dimensionality Reduction - Continuous
- Association Rule Learning
- Semi-supervised Learning
- Semi-Clustering
- Semi-Classification
- Reinforcement Learning
Consider the learning model
- Discriminative Model
- Discriminative Function
- Probabilistic Discriminative Model
- Generative Model
- Classification
Logistic Regression
(optimization algo.)k-Nearest Neighbors (kNN)
Support Vector Machine (SVM)
- Derivation (optimization algo.)Naive Bayes
Decision Tree (ID3, C4.5, CART)
- Regression
Linear Regression
(optimization algo.)Tree (CART)
- Clustering
k-Means
Hierarchical Clustering
DBSCAN
- Association Rule Learning
- Dimensionality Reduction
Principal Compnent Analysis (PCA)
Single Value Decomposition (SVD)
- LSA, LSI, Recommendation SystemCanonical Correlation Analysis (CCA)
Isomap
(nonlinear)Locally Linear Embedding (LLE)
(nonlinear)Laplancian Eigenmaps
(nonlinear)
- Bagging
Random Forests
- Boosting
AdaBoost
<- With some basic boosting notesGradient Boosting
Gradient Boosting Decision Tree (GBDT)
(aka. Multiple Additive Regression Tree (MART))
XGBoost
LightGBM
Hidden Markov Model (HMM)
- Sequencial Labeling ProblemConditional Random Field (CRF)
- Classification Problem (e.g. Sentiment Analysis)
Maximum Entropy Model (MEM)
Bayesian Network
(aka. Probabilistic Directed Acyclic Graphical Model)
Probabilistic Latent Semantic Analysis (PLSA)
Latent Dirichlet Allocation (LDA)
Vector Space Model (VSM)
Radial Basic Function (RBF) Network
Isolation Forest
One-Class SVM
- Classification
- Data Preprocessing
- Real-world Problem
- Evaluation Metrics
- Binary to Multi-class Expension
- Regression
- Evaluation Metrics
- Clustering
- Evaluation Metrics
- Data Mining - Knowledge Discovering
- Feature Engineering
- Training optimization
- Memory usage
- Evaluation time complexity
- Training optimization
- Recommendation System
- Collaborative Filtering (CF)
- Information Retrieval - Topic Modelling
- Latent Semantic Analysis (LSA/LSI/SVD)
- Latent Dirichlet Allocation (LDA)
- Random Projections (RP)
- Hierarchical Dirichlet Process (HDP)
- word2vec
- Kernel Usages
- Convex Optimization
- Distance/Similarity Measurement - basis of clustering and recommendation system
- Linear Algebra
- Orthogonality
- Eigenvalues
- Hessian Matrix
- Quadratic Form
- Markov Chain - HMM
- Calculus
- Multivariable Deratives
- Quadratic Approximations
- Lagrange Multipliers and Constrained Optimization - SVM SMO
- Lagrange Duality
- Multivariable Deratives
- Probability and Statistics
- Statistical Estimation
- Algebra
- Trigonometry
(from A to Z)
- Decision Tree
- Entropy
- HMM
- Markov Chain
- Naive Bayes
- Bayes' Theorem
- PCA
- Orthogonal Transformations
- Eigenvalues
- SVD
- Eigenvalues
- SVM
- Convex Optimization
- Constrained Optimization
- Lagrange Multipliers
- Kernel
- Machine Learning in Action
- 統計學習方法 (李航)
- 機器學習 (周志華) (alias 西瓜書)
- Python Machine Learning
- Introduction to Machine Learning 3rd
- Solution Manual
- Previous version: 1st, 2nd
- Automated Machine Learning: Methods, Systems, Challenges (AutoML)
- Linear Algebra with Applications (Steven Leon)
- Convex Optimization (Stephen Boyd & Lieven Vandenberghe)
- Numerical Linear Algebra (L. Trefethen & D. Bau III)
- Google - Machine Learning Recipes with Josh Gordon
- Youtube - Machine Learning Fun and Easy
- Siraj Raval - The Math of Intelligence
- bilibili - 機器學習 - 白板推導系列
- bilibili - 機器學習升級版
- Google Machine Learning Crash Course
- Learn with Google AI
- Kaggle Learn Machine Learning
- Microsoft Professional Program - Artificial Intelligence track
- Intel AI Developer Program - AI Courses
- Machine Learning from Scratch (eriklindernoren/ML-From-Scratch)
- Avik-Jain/100-Days-Of-ML-Code - 100 Days of ML Coding
- ddbourgin/numpy-ml - Machine learning, in numpy
- Machine learning Resources
- microsoft/recommenders - Best Practices on Recommendation Systems
- dformoso/machine-learning-mindmap
Textbook Implementation
- Machine Learning in Action
- Learning From Data (林軒田)
- 統計學習方法 (李航)
- Stanford Andrew Ng CS229
- NTU Hung-Yi Lee
- UCI Machine Learning Repository
- Awesome Public Datasets
- Kaggle Datasets
- The MNIST Database of handwritten digits
- 資料集平台 Data Market
- AI Challenger Datasets
- Peking University Open Research Data
- Open Images Dataset
- Alibaba Cloud Tianchi Data Lab
Global
Taiwan
China
- AutoML
- Optuna - A hyperparameter optimization framework
- Hyperopt - Distributed Asynchronous Hyper-parameter Optimization
- Extension plugin -
pip install jupyter_contrib_nbextensions
- VIM binding
- Codefolding
- ExecuteTime
- Notify
- Jupyter Theme -
pip install --upgrade jupyterthemes