/data-science-masters

Self-study plan to achieve mastery in data science

Data Science Masters

Self-study plan to achieve zero to mastery in data science.

Goals

  • Score top 20% in Kaggle competitions
  • Expert with different data types (text, image, audio, video)
  • Expert with different techniques (regression, SVM, deep learning, genetic algorithms, etc)
  • Familiar with modern tooling (python, pandas, scikit, R, tensorflow, apache spark, etc)
  • Expert with various problems (classification, search, clustering, prediction, recommendation, etc)
  • fundamentals (able to read and implement technical papers)
  • building at scale pipelines / architectures

Study Plan Overview

  • Module 0 - Highschool Math
  • Module 1 - College Math I (Calculus)
  • Module 2 - College Math II (Linear Algebra)
  • Module 3 - College Math III (Discrete Math)
  • Module 4 - College Math IV (Probability and Statistics)
  • Module 5 - Computation and Algorithms
  • Module 6 - Artificial Intelligence and Machine Learning
  • Module 7 - Deep Learning
  • Module 8 - Data Mining and Recommenders
  • Module 9 - NLP and Computer Vision
  • Module 10 - Cloud Computing Architectures / Data Center Engineering

It is recommended to look ahead so long as the general trend is that of finishing earlier modules before later modules.

Module 0 - Highschool math

Not everyone was lucky enough to have a good start with math growing up. The goal is to level the playing field - by the end of module 0 you should feel like you went to a highschool with world class teachers and finished top of your math class.

Algebra

Geometry

Pre Calculus

Statistics and Probability

Required Reading

Module 1 - College Math I (Calculus)

Differential Equations

Supplementary Material

Module 2 - College Math II (Linear Algebra)

Required Reading

Supplementary Material

Module 3 - College Math III (Discrete Math)

Proofs

Proofs, Set theory, propositional logic, induction, invariants, state-machines

Graph Theory

Combinatorics

Counting

Series, Sequences, Recurrences

Number Theory

divisibility, bezouts identity, modular arithmetic, eulers totient theorem, fermats little theorem, factorization, diophantine equations, fundemental theorem of arithmetic, chinese remainder theorem

Discrete Probability

Abstract Algebra

Supplementary Material

Module 4 - College Math IV (Probability and Statistics)

Module 5 - Computation and Algorithms

R Programming

Information Theory

Python and Computation and Data

Algorithms

  1. Introduction
    • Algorithmic thinking, peak finding
    • Models of computation, Python cost model, document distance

Resources πŸ“š Introduction to Algorithms (CLRS)

Module 6 - Artificial Intelligence and Machine Learning

https://www.coursera.org/specializations/aml

Artificial Intelligence

Machine Learning

Machine Learning Specialization by University of Washington on Coursera

Module 7 - Deep Learning

Deep Learning by deeplearning.ai on Coursera

Goals:

  • different activation functions (sigmoid/tanh/relu)
  • different cost functions
  • with and without bias units
  • classification and regression problems
  • text / binary / image / recommenders
  • batch vs stochastic
  • JS, Python, PHP, Matlab, TensorFlow, SciKitLearn
  • create visualizations and blog explanations
  • Audit best courses / books

Module 8 - Data Mining and Recommenders

Module 9 - NLP and Computer Vision

NLP

Image and Computer Vision

Module 10 - Architectures / Data Centers

Electives

Resources

Reading List