This is a plate of machine learning algorithm implementation tastes made of Python and Numpy. Enjoy!
- Linear Regression: BGD, SGD, MBGD
- Logistic Regression: BGD, SGD, MBGD
- MaxEnt (Maximum Entropy Method)
- Perceptron
- KNN (K-Nearest Neighbor)
- Naive Bayes Classifier: Multinomial, Bernoulli, Gaussian
- HMM (Hidden Markov Model): Forward, Backward, Viterbi, Baum-Welch algorithms
- EM (Expectation Maximization): 3-coin problem
- Decision Tree
- Adaboost
- more algorithms on the way...
- Independence: each algorithm file is independent, no class inheritance, no function overload, no shared util functions, even many of them look exactly the same. We break the thumb of rule of programming here, so you can clone this repo and just grab any single implementation you want and directly embed to your own codes!
- Plainness: we use very plain codes to illustrate each algorithm, and prefer to use more straightforward function calls than high level syntax sugars from 3rd party libs, which could be hard to get their purposes from names quickly.
- Cheatsheet: it is not enough to understand an algorithm merely through its impelmentation codes, since most of them are results from theories, e.g., the weight update formula. We add a math cheatsheet along with every algorithm so you could pick up the math quickly!
- blue: something that you should take notice in current step
- red: something that is removed in the next step
- green: something that is added to the math in current step
- X: a data matrix, shaped as (|num_x|, |num_features|)
- Y: a label vector, corresponding to X
- W: a weight matrix, shaped as (|num_labels|, |num_features|)
- Z: a sum over probabilities of all labels / a vector showing sums over weights dot x features
- X[i]: the ith data record
- Y[i]: the ith label record, corresponding to X[i]
- W[i]: the weight vector for label i
- x: a feature vector, x = X[i]
- y: a label scalar, corresponding to x, y = Y[i]
- w: a feature weight vector
- z: a scalar showing sums over weights dot x features
- b: a bias
- α: the learning rate for gradient descent
You may see a mass in the math Markdown cheatsheets, since GitHub doesn't support Math LaTex formatting. If you would like to view the LaTex formatted math cheatsheets in Markdown, plese do the followings:
- Local: In VS code, install Markdown All in One and Markdown Preview Enhanced extensions.
- Online (GitHub): In Chrome, install MathJax Plugin for Github extension.