/coursera-ML

use numpy, scipy, and tensorflow to implement these basic ML model and learning algorithm

Primary LanguageJupyter NotebookMIT LicenseMIT

Coursera ML MOOC

Andrew's class may be the common sense among ML practitioners.

I don't want to fool myself.
Even I have read some api doc of sklearn and know how to call them, I don't know the soul of machine learning. I have to get the basics right. So I implement every exercise of the Coursera ML class using numpy, scipy and tensorflow.

The reason I choose python over matlab is purely practical concern. This cs224d Intro to TensorFlow (video) presents very good explanation of why python may be the right choice to do ML.

All these learning about theories and coding are preparation of real world application. Although the learning itself is it's own reward, I also want to create useful application that solves real world problems and create values to the community. This project is the very tiny step toward the goal. I learned so much.

The more I learn, the more I respect all those great scientific adventures before me that paves the way I have right now. Andrew's class is very good overview of general ML. It's hands on approach encourages new people like me keep moving, even some details are purposefully ignored. On the other hand, I found it very useful to pick up theories while doing these exercises. This book Learning from Data gives me so many aha moment about learning theories. This is my feeble foundation of ML theories.

Generally, Andrew's class shows me mostly what to do, and how to do it. The book shows me why. Theory and practice goes hand in hand. I couldn't express how happy I am when I read something in the book and suddenly understand the reason about what I was coding last night. Eureka!

Project structure

  • Each exercise has it's own folder. In each folder you will find:
    1. pdf that guide you through the project
    2. a series of Jupyter notebook
    3. data
  • each notebook basically follows the logic flow of project pdf. I didn't present all codes in notebook because I personally think it's very messy. So you will only see visualization, project logic flows, simple experiments, equations and results in notebooks.
  • In helper folder, it has modules of different topics. This is where you can find details of model implementation, learning algorithm, and supporting functions.

Go solo with python or go with built-in Matlab project?

The Matlab project is guiding students to finish the overall project goal, be it implementing logistic regression, or backprop NN. It includes many supporting function to help you do visualization, gradient checking, and so on.
The way I do it is to focus on pdf that tells you what is this project about, then figure out how to achieve those objectives using Scipy stack. Most of time I don't even bother looking into original .m files. Just need their data.

Without those supports, I have to do:

  1. visualization : seaborn, matplotlib are very handy
  2. vetorized implementation of ML model and gradient function use numpy's power to manupulate ndarray
  3. optimization : figure out how to use scipy optimizer to fit you parameters
  4. support functions : nobody is loading, parsing, normalize data for you now, DIY

By doing those, I learn more, which is even better.

Supporting materials

I am learning by doing, not tools hoarding. Here is the list that helps me along the way.

Run locally

If you find bugs, false logic, just anything that could be better, please do me a favor by creating issues. I would love to see constructively negative feedbacks

  • acknowledgement: Thank you John Wittenauer! I shamelessly steal lots of your code and idea. here
  • if you want to run notebooks locally, you could refer to requirement.txt for libraries I've been using.
  • I'm using python 3.5.2 for those notebooks. You will need it because I use @ operator for matrix multiplication extensively.

tensorflow (64-bit linux only) is now available on https://t.co/292ZKEfpjQ Use conda install tensorflow to get it!

— Continuum Analytics (@ContinuumIO) September 19, 2016
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>

Read notebook with nbviewer, and references for each exercise

  • Special thing I did in this project is I implement the linear regression model in TensorFlow. This is my first tf experience. Looking forward to learn more when I move into Deep Learning. code: linear_regression.py
  • The Elements of Statistical Learning pg.64 has very good explanation about singular value decomposition, which is used to find principle components in our PCA. The book is free to download.