Summer program: Machine Learning - From handwritten number classification to Kaggle contest

The aim of this project is to let a high school student to get basic knowledges of maching learning techniques, and try to apply them on the real world dataset.
There is a real world problem: sometimes we have handwrite numbers, like phone numbers, like credit card number. It is a tedious work to type those numbers on keyboard, to input into the computer. Now we want to build a machine learning model, which can recognize the vague handwritten numbers, and automatically change it to digital numbers.
This is a kind of issue which can be solved by machine learning techniques. We have a lot of real world problems, as long as we have the dataset, we can train a machine learning model to solve them.

Learning Goals

  1. Learn the basic concepts about machine learning: dataset, model, training, predicting, and the metrics to evaluate the performace of a maching learning model.
  2. Learn how to train a machine learning model by Python + Scikit Learn.
  3. Learn the basic principle of several classical machine learning algorithms, such as SVM, KNN, Naive Bayes, Decision Tree and Random forest, etc.
  4. Learn how to compare the classification result by human-readable plot, by using matplotlib.

Expected Result

  1. Given the data set from https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits, train a machine learning model based on the testset, and do a prediction on the testset.
  2. Compare the results of different machine learning algorithms on this task. Which one is the most accurate? Which one is the fastest in training and which one is the fastest in predicting?
  3. Using matplotlib, draw above comparison into graphs/plots.
  4. Pick one Kaggle contest, and acheive some result at the end of the project, by using he skills learned from the above 3 steps.

References

  1. https://www.jetbrains.com/pycharm/download/#section=mac
  2. https://brew.sh/
  3. https://scikit-learn.org/stable/
  4. https://www.youtube.com/watch?v=KTeVOb8gaD4
  5. https://www.youtube.com/watch?v=q7Bo_J8x_dw&list=PLQVvvaa0QuDfefDfXb9Yf0la1fPDKluPF
  6. https://scikit-learn.org/stable/auto_examples/index.html#classification
  7. https://www.kaggle.com/

Timeline

Week1: Install the python IDE on macbook, download the dependency libraries by pip, get the basic knowledge of machine learning. Implement a machine learning classification solution, train it on the handwritten numbers (toy dataset), and use it to predict the numbers.

Week2: Explore the Kaggle website and pick one contest. Learn how to deal with real world datasets, how to load the data in .csv format, and how to do data normalization and clean up before we train the model. Evaluate the performace of each machine learning model, and learn the benchmarks for evaluation the precision of a machine learning model.

Week3: Try diffrent machine learning solutions, try to beat the other competitors on Kaggle.

Week4: Improve the machine learning model from the feature engineering perspective, or try to find the most fit model. (To be determined)