legolas140/competitive-data-science

How to Win a Data Science Competition: Learn from Top Kagglers

Jupyter Notebook

Recap of main ML algorithms

Additional Tools

Feature preprocessing and generation with respect to models

Feature preprocessing

Feature generation

Feature extraction from text and images

Bag of words

Word2vec

NLP Libraties

NLTK, spaCy, TextBlob

Pretrained models

Finetuning

Exploratory data analysis

Biclustering algorithms for sorting corrplots

Validation

Data leakage

Perfect score script by Oleg Trott - used to probe leaderboard
Page about data leakages on Kaggle

Metrics optimization

Classification

Ranking

Learning to Rank using Gradient Descent - original paper about pairwise method for AUC optimization
Overview of further developments of RankNet
RankLib (implemtations for the 2 papers from above)
Learning to Rank Overview

Clustering

Evaluation metrics for clustering

Hyperparameter tuning

Tips and tricks

Advanced features

Matrix Factorization:

Overview of Matrix Decomposition methods (sklearn)

t-SNE:

Interactions:

Ensembling