**A collection of tutorials and examples for solving and understanding machine learning and pattern classification tasks.**
-
Entry Point: Data - Using Python's sci-packages to prepare data for Machine Learning tasks and other data analyses [IPython nb]
-
An Introduction to simple linear supervised classification using
scikit-learn
[IPython nb]
-
Projection
- Component Analyses
- Linear Transformation
- Principal Component Analysis (PCA) [IPython nb]
- Linear Discriminant Analysis (LDA) [IPython nb]
- Linear Transformation
- Component Analyses
-
Feature Selection
- Sequential Feature Selection Algorithms [IPython nb]
-
Parametric Techniques
- Introduction to the Maximum Likelihood Estimate (MLE) [IPython nb]
- How to calculate Maximum Likelihood Estimates (MLE) for different distributions [IPython nb]
-
Non-Parametric Techniques
- Kernel density estimation via the Parzen-window technique [IPython nb]
- The K-Nearest Neighbor (KNN) technique
-
Regression Analysis
-
Linear Regression
- Least-Squares fit [IPython nb]
-
Non-Linear Regression
-
-
Supervised Learning
-
Parametric Techniques
-
Univariate Normal Density
- Ex1: 2-classes, equal variances, equal priors [IPython nb]
- Ex2: 2-classes, different variances, equal priors [IPython nb]
- Ex3: 2-classes, equal variances, different priors [IPython nb]
- Ex4: 2-classes, different variances, different priors, loss function [IPython nb]
- Ex5: 2-classes, different variances, equal priors, loss function, cauchy distr. [IPython nb]
-
Multivariate Normal Density
- Ex5: 2-classes, different variances, equal priors, loss function [IPython nb] - Ex7: 2-classes, equal variances, equal priors [IPython nb]
-
-
Non-Parametric Techniques
-
-
Unsupervised Learning
-
Kaggle - Kaggle, the leading platform for predictive modeling competitions.
-
UCI MLR - UC Irvine Machine Learning Repository
-
google.com/publicdata - public data maintained by Google
-
Freebase - A community-curated database of well-known people, places, and things
-
SMS Spam Collection - A collection of 425 SMS spam messages was manually extracted from the Grumbletext Web site
-
SNAP - Stanford Large Network Dataset Collection
-
Amazon Google Books Ngrams - A data set containing Google Books n-gram corpuses
-
The Million Song Dataset - Audio features and metadata for a million contemporary popular music tracks.
-
Modeling Online Auctions - Datasets of bidding for different ebay auctions
-
CAT Dataset - A dataset of 10,000 cat images
-
Click Dataset - A large dataset of about 53.5 billion HTTP requests made by users at Indiana University
-
Meteorites - Registered meteorites that have impacted on Earth
-
Common Crawl 2012 web corpus - A hyperlink graph of 3.5 billion web pages and 128 billion hyperlinks between these pages
-
PyPi/Maven Dependency Data - State of the Maven/Java dependency graph and state of the PyPi/Python dependency graph.
-
NYPD Crash Data Band-Aid - NYPD traffic crash data as a geocoded CSV
-
Pass rates, race & gender - Detailed data on pass rates, race, and gender for 2013
-
Nominate/vote data - Datasets including all the D-NOMINATE and W-NOMINATE scores
-
aiHit Datasets - Information on random 10,000 UK companies sampled from aiHit DB
-
Amsterdam Library of Object Images (ALOI) - A color image collection of one-thousand small objects, recorded for scientific purposes