The table of contents is as follows:
- Overview
- Prerequisites
- scikit-learn
- Libraries
- requirements.txt
- TODO
- Credits
- Alternatives
- Data Cleaning
- Quick Hits
- End to End
- Tools
This is a nice free introduction to Machine Learning with Python.
Here is how the folks at nVidia see the relationship between Artifical Intelligence, Machine Learning and Deep Learning:
Towards the beginning of my career, I was interested in AI and joined a society founded by Donald Michie - who was then at the University of Edinburgh. I wonder how much things have progressed since then?
Machine Learning is hot right now, and of course the cloud providers have noticed.
Here is Google's Cloud offering:
http://cloud.google.com/products/machine-learning/
For a more sombre view of things, the following article is worth reading:
http://www.cio.com/article/3223191/artificial-intelligence/a-practical-guide-to-machine-learning-in-business.html
Chris Manning, Stanford, 3 Apr 2017:
"Essentially, Python has just become the lingua franca of nearly all the deep learning toolkits, so that seems the thing to use."
http://youtu.be/OQQ-W_63UgQ?list=PL3FW7Lu3i5Jsnh1rnUwq_TcylNr7EkRe6&t=2102
For an explanation of why Python (as contrasted with other languages) is a good choice for Natural language processing the following link is worth a look:
http://www.nltk.org/book_1ed/ch00-extras.html
-
Python (Python 2 support has been dropped from a number of projects, so use Python 3)
-
pip
or possiblypip3
(if using Python 2 and Python 3)
pip
(or pip3
) is the Package manager for Python, much as npm
is the package manager
for the Node.js platform.
The course uses this library, which it refers to as sklearn
.
The latest version may be found here:
http://scikit-learn.org/stable/
To install this library in multi-user mode (not recommended) with pip
(replace with pip3
if using Python 3):
pip install -U scikit-learn
To install this library in single-user mode (recommended) with pip
(replace with pip3
if using Python 3):
pip install --user scikit-learn
It's not really possible to do much of anything in Python without additional libraries.
Essential libraries include:
Useful optional libraries include:
Verify library presence and version with pip
as with scikit-learn
:
pip list --format=freeze | grep numpy
[Replace numpy
above as necessary.]
Or verify library presence and version with Python:
python -c "import numpy as im; print(im.__version__)"
[Likewise replace numpy
above as necessary.]
Or use try_import.py
for multiple libraries as shown:
$ python try_import.py numpy scipy sklearn keras pytorch
"numpy" was imported
"scipy" was imported
"sklearn" was imported
Using TensorFlow backend.
"keras" was imported
"pytorch" could not be imported - try "pip install --user pytorch"
$
Install the library with pip
(either multi-user or single-user) as with scikit-learn
above.
NumPy allows for a nice performance optimization called single instruction, multiple data, or SIMD.
Basically, this allows for vector or matrix handling (compare 'vectors\ pt1.py' to 'vectors\ pt2.py').
Matplotlib is great for plotting variables, but can be very low-level.
To make these graphs look a little better, check out my No More Blue repo.
Or - for a higher-level library - check out Seaborn.
[Seaborn will greatly simplify a number of difficult matplotlib
graphing exercises.]
Although not used in this course, StatsModels is also worth a look.
It provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.
Some Seaborn functions will optionally use StatsModels if it is installed.
Of course, it's also possible (as with npm or composer) to install all dependencies in one fell swoop (probably a best practice).
Simply list the dependencies in a file (for example requirements
or requirements.txt
) and install from it:
pip install --user -r requirements.txt
[Note the --user
option, which may be omitted for a Global install, also the -r
option to specify an input file.]
- Finish course
- Update Quick Hit links to make them easier to navigate
- Update everything for the most recent (and secure) version of TensorFlow
Based upon:
http://www.udacity.com/course/intro-to-machine-learning--ud120
You can find an interview with co-author Katie Malone here:
http://www.se-radio.net/2017/03/se-radio-episode-286-katie-malone-intro-to-machine-learning/
The following look like interesting options too:
http://web.stanford.edu/class/cs224n/
http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning
A lot (lets say three quarters) of a data scientist's time is spent massaging data. Which is a pretty important (lets say critically important) part of a data scientist's job and not often discussed.
Garbage in, garbage out.
[Not to mention the (very expensive) computer time wasted.]
For a quick introduction to data cleaning with numpy
and pandas
, have a look at this
great tutorial:
http://realpython.com/python-data-cleaning-numpy-pandas/
You can see my stab at it here.
For a more complicated example, check out my ML with Missing Data repo.
For an easy (and quick) introduction to the various Python tools and ML concepts:
http://www.youtube.com/playlist?list=PLOU2XLYxmsIIuiBfYad6rFYQU_jL2ryal
This series is from mid-2016 so there is a small amount of 'code rot', plus it seems to use Python 2 rather than Python 3, but even so it's a quick and fun way to get a brief overview of ML and the tools & techniques involved.
- Machine Learning Recipes #1 (Hello World)
- Machine Learning Recipes #2 (Visualizing a Decision Tree)
- Machine Learning Recipes #3 (What Makes a Good Feature?)
- Machine Learning Recipes #4 (Let's Write a Pipeline)
- Machine Learning Recipes #5 (Writing Our First Classifier)
- Machine Learning Recipes #6 (Train an Image Classifier with TensorFlow for Poets)
- Machine Learning Recipes #7 (Classifying Handwritten Digits with TF.Learn)
- Machine Learning Recipes #8 (Let's Write a Decision Tree Classifier from Scratch)
- Machine Learning Recipes #9 (Intro to Feature Engineering with TensorFlow)
- Machine Learning Recipes #10 (Getting Started with Weka)
For a deeper dive into the Iris dataset, check out my ML with SciPy repo.
This project shows a full end-to-end workflow.
There are a number of tools, such as Python, IPython, and Jupyter Notebooks.
One website that gets a lot of mentions is Anaconda:
http://www.anaconda.com/download/