/coursera-hse-machine-learning

My source code and solutions for the machine learning course on Coursera

Primary LanguagePython

by Higher School of Economics & Yandex Data School

I've completed the course in March, 2016 with the 100 % score.

Week 1

Lesson 3: Introduction to Tools

NumPy for operations on vectors & matrices:

Pandas for data preconditioning:

Lesson Insights: There were 577 males & 314 females aboard Titanic. The most frequent female first name was Anna (second frequent name was Mary). Average age was 29.7, while median is 28. 24 % of all passengers had 1st class tickets. Only 38 % survived.

Lesson 4: Decision Trees

Decision tree feature importances: 01-sklearn-decision-tree-feature-importances.py

Lesson Insights: Females & passengers with the most expensive tickets had the most chance to survive.

Week 2

Lesson 1: Metric Methods for Classification

kNN method for classification, k parameter determination: 01-neighbours-number-determination.py

kNN method for regression, metric determination: 02-metric-determination.py

Lesson 2: Linear Methods for Classification

Feature normalization for classification with Perceptron: 01-feature-normalization.py

Week 3

Lesson 1: Support Vector Machine

Support vector selection: 01-svm.py

Text analysis: 02-text-analysis.py

Lesson 2: Logistic Regression

Logistic regression & AUC-ROC score calculation: 01-logistic-regression.py

Lesson 3: Quality Metrics

Basic & complex metrics calculations: 01-score-metrics.py

Week 4

Lesson 1: Linear Regression

Ridge regression of sparse features: 01-ridge-regression.py

Lesson 2: Principal Component Analysis

Dow Jones index analysis: 01-principal-components.py

Week 5

Lesson 1: Composition of Algorithms

Random forest size calculation: 01-random-forest-size.py

Gradient boosting vs. random forest comparison: 02-gradient-boosting.py

Week 6

Lesson 1: Clustering & Visualization

Image color count reduction: 01-image-color-count-reduction.py

Week 7

Dota 2 Win Probability Prediction

Gradient boosting & logistic regression: 01-solution.py

See my results on Kaggle.