Machine Learning Recipes is a series of introductory videos covering codes and insights about ML with python and opensource libraries (TensorFlow and Scikit-learn).
Six lines of Python is all it takes to write a simple machine learning program. We will collect some simple data and with a decision tree, predict wheter an input represents an orange or an apple, by its weight and texture.
We'll build a decision tree on a real dataset, add code to visualize it, and practice reading it.
Good features are informative, independent, and simple. These concepts will be introduced by using a histogram to visualize a feature from a toy dataset. There are other great examples of histograms codes, that you can find here: https://matplotlib.org/examples/statistics/histogram_demo_multihist.html
We use the Iris data set again, but with another classifier, the k-Nearest Neighbors (KNN). Using the model_selection we can randomly select the train data from the Iris data set, and try to predict the rest of the samples. At the end we will measure the accuracy of both classifiers.
In order to really understand what is behind a classifier, we will build our own, base on the k-Nearest Neighbors (KNN). At the end, we compare the accuracy of a random prediction, our classifier KNN, and the KNN from the library.
A brief introduction to Tensorflow. This is a great way to get started learning about and working with image classification. It is important to say that tensorflow is a high level code and a powerful classifier.
This time we will create a image classifier using TF.Learn. The problem is to classify handwritten digits from the MNIST dataset. Given an image of a digit we have is to predict which one it is (0-9). At the end we'll visualize the weights the classifier learns and gain intuition on how it works. This lesson is conducted in a .ipynb file, a Notebook that can run python. You can run it by Jupyter or by Google Colab, an interesting cloud option.
In a simple example of predicting the type of fruit, we'll write a decision tree classifier (using CART). Which questions are the best to partition the dataset? How to quantify the uncertainty? We'll build the functions to answer these questions and show how the impurity and information gain are calculated.
In this lesson we will look at ways of represent the features, to turn them in a more useful representation. To visualize what the transformations on the features do, we will use the tool FACETS. We will deal with Bucketing, Crossing, Hashing and Embedding. The goal in this example is to predict if someone's income is greater than US$ 50k.
In this lesson we will work with Weka, a ML library with Graphical User Interface (GUI). The examples are: predict if a person has diabets, based on its glucose levels and predict if a congress person is a Democrat or a Republican based on how they voted on different bills. Also, we'll see how to evaluate the results of these experiments and how to do feature selection.