Course can be found in Coursera
Partial notes can be found in my blog SSQ
Course can be found in Coursera
Description | Programming Assignments |
---|---|
Models |
Linear regressionRegularization: Ridge (L2), Lasso (L1)Nearest neighbor and kernel regression
Algorithms
Gradient descentCoordinate descent
Concepts
Loss functions, bias-variance tradeoffcross-validation, sparsity, overfittingmodel selection, feature selection|Models|Linear regressionRegularization: Ridge (L2), Lasso (L1)Nearest neighbor and kernel regression|Algorithms|Gradient descentCoordinate descent|Concepts|Loss functions, bias-variance tradeoffcross-validation, sparsity, overfittingmodel selection, feature selection|[x] Fitting a simple linear regression model on housing data
[x] Exploring different multiple regression models for house price prediction
[x] Implementing gradient descent for multiple regression
[x] Exploring the bias-variance tradeoff
[x] Observing effects of L2 penalty in polynomial regression
[x] Implementing ridge regression via gradient descent
[x] Using LASSO to select features
[x] Implementing LASSO using coordinate descent
[x] Predicting house prices using k-nearest neighbors regression|
|Models|Linear regressionRegularization: Ridge (L2), Lasso (L1)Nearest neighbor and kernel regression| |Algorithms|Gradient descentCoordinate descent| |Concepts|Loss functions, bias-variance tradeoffcross-validation, sparsity, overfittingmodel selection, feature selection|
Slides and more details about this course can be found in my Github SSQ
-
Week 1 Introduction
- Regression. Case study: Predicting house prices
- Classification. Case study: Analyzing sentiment
- Clustering & Retrieval. Case study: Finding documents
- Matrix Factorization & Dimensionality Reduction. Case study: Recommending Products
- Capstone. An intelligent application using deep learning
- Familiar with Ipython notebook and Sframe
-
Week 2 Regression Predicting House Prices
- Linear Regression
- Adding higher order effects
- Evaluating overfitting via training/test split
- Adding other features
- Other regression examples
- Implement Linear Regression model with different several features
-
Week 3 Classification Analyzing Sentiment
- Classifier applications
- Linear classifiers
- Decision boundaries
- Training and evaluating a classifier
- What’s a good accuracy?
- False positives, false negatives, and confusion matrices
- Learning curves: How much data do I need?
- Class probabilities
- Implement Logistic Regression model with different several features
Course can be found in Coursera
Description | Programming Assignments | ||||||
---|---|---|---|---|---|---|---|
|
|
-
Week 1: Simple Linear Regression:
- Describe the input (features) and output (real-valued predictions) of a regression model
- Calculate a goodness-of-fit metric (e.g., RSS)
- Estimate model parameters to minimize RSS using gradient descent
- Interpret estimated model parameters
- Exploit the estimated model to form predictions
- Discuss the possible influence of high leverage points
- Describe intuitively how fitted line might change when assuming different goodness-of-fit metrics
- Fitting a simple linear regression model on housing data
-
Week 2: Multiple Regression: Linear regression with multiple features
- Describe polynomial regression
- Detrend a time series using trend and seasonal components
- Write a regression model using multiple inputs or features thereof
- Cast both polynomial regression and regression with multiple inputs as regression with multiple features
- Calculate a goodness-of-fit metric (e.g., RSS)
- Estimate model parameters of a general multiple regression model to minimize RSS:
- In closed form
- Using an iterative gradient descent algorithm
- Interpret the coefficients of a non-featurized multiple regression fit
- Exploit the estimated model to form predictions
- Explain applications of multiple regression beyond house price modeling
- Exploring different multiple regression models for house price prediction
- Implementing gradient descent for multiple regression
-
Week 3: Assessing Performance
- Describe what a loss function is and give examples
- Contrast training, generalization, and test error
- Compute training and test error given a loss function
- Discuss issue of assessing performance on training set
- Describe tradeoffs in forming training/test splits
- List and interpret the 3 sources of avg. prediction error
- Irreducible error, bias, and variance
- Discuss issue of selecting model complexity on test data and then using test error to assess generalization error
- Motivate use of a validation set for selecting tuning parameters (e.g., model complexity)
- Describe overall regression workflow
- Exploring the bias-variance tradeoff
-
Week 4: Ridge Regression
- Describe what happens to magnitude of estimated coefficients when model is overfit
- Motivate form of ridge regression cost function
- Describe what happens to estimated coefficients of ridge regression as tuning parameter λ is varied
- Interpret coefficient path plot
- Estimate ridge regression parameters:
- In closed form
- Using an iterative gradient descent algorithm
- Implement K-fold cross validation to select the ridge regression tuning parameter λ
- Observing effects of L2 penalty in polynomial regression
- Implementing ridge regression via gradient descent
-
Week 5: Lasso Regression: Regularization for feature selection
- Perform feature selection using “all subsets” and “forward stepwise” algorithms
- Analyze computational costs of these algorithms
- Contrast greedy and optimal algorithms
- Formulate lasso objective
- Describe what happens to estimated lasso coefficients as tuning parameter λ is varied
- Interpret lasso coefficient path plot
- Contrast ridge and lasso regression
- Describe geometrically why L1 penalty leads to sparsity
- Estimate lasso regression parameters using an iterative coordinate descent algorithm
- Implement K-fold cross validation to select lasso tuning parameter λ
- Using LASSO to select features
- Implementing LASSO using coordinate descent
-
Week 6: Going nonparametric: Nearest neighbor and kernel regression
- Motivate the use of nearest neighbor (NN) regression
- Define distance metrics in 1D and multiple dimensions
- Perform NN and k-NN regression
- Analyze computational costs of these algorithms
- Discuss sensitivity of NN to lack of data, dimensionality, and noise
- Perform weighted k-NN and define weights using a kernel
- Define and implement kernel regression
- Describe the effect of varying the kernel bandwidth λ or # of nearest neighbors k
- Select λ or k using cross validation
- Compare and contrast kernel regression with a global average fit
- Define what makes an approach nonparametric and why NN and kernel regression are considered nonparametric methods
- Analyze the limiting behavior of NN regression
- Use NN for classification
- Predicting house prices using k-nearest neighbors regression
Course can be found in Coursera
Description | Programming Assignments | ||||||||
---|---|---|---|---|---|---|---|---|---|
|
|
Slides and more details about this course can be found in my Github
- Week 1:
- Linear Classifiers & Logistic Regression
- decision boundaries
- linear classifiers
- class probability
- logistic regression
- impact of coefficient values on logistic regression output
- 1-hot encoding
- multiclass classification using the 1-versus-all
- Predicting sentiment from product reviews
- Linear Classifiers & Logistic Regression
- Week 2:
- Learning Linear Classifiers
- Maximum likelihood estimation
- Gradient ascent algorithm for learning logistic regression classifier
- Choosing step size for gradient ascent/descent
- (VERY OPTIONAL LESSON) Deriving gradient of logistic regression
- Implementing logistic regression from scratch
- Overfitting & Regularization in Logistic Regression
- Overfitting in classification
- Overconfident predictions due to overfitting
- L2 regularized logistic regression
- Sparse logistic regression
- Implementing Logistic Regression with L2 regularization
- Learning Linear Classifiers
- Week 3:
- Decision Trees
- Predicting loan defaults with decision trees
- Learning decision trees
- Recursive greedy algorithm
- Learning a decision stump
- Selecting best feature to split on
- When to stop recursing
- Using the learned decision tree
- Traverse a decision tree to make predictions: Majority class predictions; Probability predictions; Multiclass classification
- Learning decision trees with continuous inputs
- Threshold splits for continuous inputs
- (OPTIONAL) Picking the best threshold to split on
- Identifying safe loans with decision trees
- Implementing binary decision trees from scratch
- Decision Trees
- Week 4
- Overfitting in decision trees
- Identify when overfitting in decision trees
- Prevent overfitting with early stopping
- Limit tree depth
- Do not consider splits that do not reduce classification error
- Do not split intermediate nodes with only few points
- Prevent overfitting by pruning complex trees
- Use a total cost formula that balances classification error and tree complexity
- Use total cost to merge potentially complex trees into simpler ones
- Decision Trees in Practice for preventing overfitting
- Handling missing data
- Describe common ways to handling missing data:
- Skip all rows with any missing values
- Skip features with many missing values
- Impute missing values using other data points
- Modify learning algorithm (decision trees) to handle missing data:
- Missing values get added to one branch of split
- Use classification error to determine where missing values go
- Describe common ways to handling missing data:
- Overfitting in decision trees
- Week 5
- Boosting
- Identify notion ensemble classifiers
- Formalize ensembles as the weighted combination of simpler classifiers
- Outline the boosting framework – sequentially learn classifiers on weighted data
- Describe the AdaBoost algorithm
- Learn each classifier on weighted data
- Compute coefficient of classifier
- Recompute data weights
- Normalize weights
- Implement AdaBoost to create an ensemble of decision stumps
- Discuss convergence properties of AdaBoost & how to pick the maximum number of iterations T
- Exploring Ensemble Methods with pre-implemented gradient boosted trees
- Implement your own boosting module
- Boosting
- Week 6
- Evaluating classifiers: Precision & Recall
- Classification accuracy/error are not always right metrics
- Precision captures fraction of positive predictions that are correct
- Recall captures fraction of positive data correctly identified by the model
- Trade-off precision & recall by setting probability thresholds
- Plot precision-recall curves.
- Compare models by computing precision at k
- Exploring precision and recall
- Evaluating classifiers: Precision & Recall
- Week 7
- Scaling to Huge Datasets & Online Learning
- Significantly speedup learning algorithm using stochastic gradient
- Describe intuition behind why stochastic gradient works
- Apply stochastic gradient in practice
- Describe online learning problems
- Relate stochastic gradient to online learning
- Training Logistic Regression via Stochastic Gradient Ascent
- Scaling to Huge Datasets & Online Learning
Course can be found in Coursera
Description | Programming Assignments | ||||||||
---|---|---|---|---|---|---|---|---|---|
|
|
Slides and more details about this course can be found in my Github SSQ
-
Week 1 Intro
-
Week 2 Nearest Neighbor Search: Retrieving Documents
- Implement nearest neighbor search for retrieval tasks
- Contrast document representations (e.g., raw word counts, tf-idf,…)
- Emphasize important words using tf-idf
- Contrast methods for measuring similarity between two documents
- Euclidean vs. weighted Euclidean
- Cosine similarity vs. similarity via unnormalized inner product
- Describe complexity of brute force search
- Implement KD-trees for nearest neighbor search
- Implement LSH for approximate nearest neighbor search
- Compare pros and cons of KD-trees and LSH, and decide which is more appropriate for given dataset
- Choosing features and metrics for nearest neighbor search
- Implementing Locality Sensitive Hashing from scratch
-
Week 3 Clustering with k-means
- Describe potential applications of clustering
- Describe the input (unlabeled observations) and output (labels) of a clustering algorithm
- Determine whether a task is supervised or unsupervised
- Cluster documents using k-means
- Interpret k-means as a coordinate descent algorithm
- Define data parallel problems
- Explain Map and Reduce steps of MapReduce framework
- Use existing MapReduce implementations to parallelize kmeans, understanding what’s being done under the hood
- Clustering text data with k-means
-
Week 4 Mixture Models: Model-Based Clustering
- Interpret a probabilistic model-based approach to clustering using mixture models
- Describe model parameters
- Motivate the utility of soft assignments and describe what they represent
- Discuss issues related to how the number of parameters grow with the number of dimensions
- Interpret diagonal covariance versions of mixtures of Gaussians
- Compare and contrast mixtures of Gaussians and k-means
- Implement an EM algorithm for inferring soft assignments and cluster parameters
- Determine an initialization strategy
- Implement a variant that helps avoid overfitting issues
- Implementing EM for Gaussian mixtures
- Clustering text data with Gaussian mixtures
-
Week 5 Latent Dirichlet Allocation: Mixed Membership Modeling
- Compare and contrast clustering and mixed membership models
- Describe a document clustering model for the bagof-words doc representation
- Interpret the components of the LDA mixed membership model
- Analyze a learned LDA model
- Topics in the corpus
- Topics per document
- Describe Gibbs sampling steps at a high level
- Utilize Gibbs sampling output to form predictions or estimate model parameters
- Implement collapsed Gibbs sampling for LDA
- Modeling text topics with Latent Dirichlet Allocation
-
Week 6 Hierarchical Clustering & Closing Remarks
- Bonus content: Hierarchical clustering
- Divisive clustering
- Agglomerative clustering
- The dendrogram for agglomerative clustering
- Agglomerative clustering details
- Hidden Markov models (HMMs): Another notion of “clustering”
- Modeling text data with a hierarchy of clusters
- Bonus content: Hierarchical clustering