/data-science-masters

Self-study plan to achieve mastery in data science

Data Science Masters

Zero to mastery in data science.

Goals

  • Score top 20% in Kaggle competitions
  • Expert with different data types (text, image, audio, video)
  • Expert with different techniques (regression, SVM, deep learning, genetic algorithms, etc)
  • Familiar with modern tooling (python, pandas, scikit, R, tensorflow, apache spark, etc)
  • Expert with various problems (classification, search, clustering, prediction, recommendation, etc)
  • fundamentals (able to read and implement technical papers)
  • building at scale pipelines / architectures

Study Plan Overview

  • Module 0 - Highschool Math
  • Module 1 - College Math I (Calculus)
  • Module 2 - College Math II (Linear Algebra)
  • Module 3 - College Math III (Discrete Math)
  • Module 4 - College Math IV (Probability and Statistics)
  • Module 5 - Computation and Algorithms
  • Module 6 - Artificial Intelligence and Machine Learning
  • Module 7 - Deep Learning
  • Module 8 - Data Mining and Recommenders
  • Module 9 - NLP and Computer Vision
  • Module 10 - Cloud Computing Architectures / Data Center Engineering

It is recommended to look ahead so long as the general trend is that of finishing earlier modules before later modules.

Module 0 - Highschool math

Not everyone was lucky enough to have a good start with math growing up. The goal is to level the playing field - by the end of module 0 you should feel like you went to a highschool with world class teachers and finished top of your math class.

Algebra

Geometry

Pre Calculus

Statistics and Probability

Required Reading

Module 1 - College Math I (Calculus)

Supplementary Material

Module 2 - College Math II (Linear Algebra)

Required Reading

Supplementary Material

Module 3 - College Math III (Discrete Math)

3.1 Proofs and Logic

Proofs, Set theory, propositional logic, induction, invariants, state-machines

3.2 Number Theory

Number theory is fundamental in reasoning about numbers as discrete mathematic structures with applications in cryptography and efficient numerical computation.

By the end of this sub-module you should be very confident proving and reasoning about concepts including: divisibility, bezouts identity, modular arithmetic, eulers totient theorem, fermats little theorem, integer factorization, diophantine equations, the fundemental theorem of arithmetic, chinese remainder theorem, RSA and the discrete logarithm problem.

Problem Sets

Worked solutions to problem sets here

Optional Supplementary Material

3.3 Combinatorics

Combinatorics is a vital skill in reasoning about the size of finite sets.

Problem Sets

3.4 Graph Theory

3.5 Series, Sequences, Recurrences

todo

3.6 Discrete Probability

todo

Discrete Math Supplementary Material

Module 4 - College Math IV (Probability and Statistics)

Module 5 - Computation and Algorithms

Algorithms

Resources

Information Theory

Python and Computation and Data

Module 5.5 - Databases, and Computer Architecture

Supplementary

Module 6 - Artificial Intelligence and Machine Learning

https://www.coursera.org/specializations/aml

Artificial Intelligence

Machine Learning

Machine Learning Specialization by University of Washington on Coursera

Module 7 - Deep Learning

Deep Learning by deeplearning.ai on Coursera

Goals:

  • different activation functions (sigmoid/tanh/relu)
  • different cost functions
  • with and without bias units
  • classification and regression problems
  • text / binary / image / recommenders
  • batch vs stochastic
  • JS, Python, PHP, Matlab, TensorFlow, SciKitLearn
  • create visualizations and blog explanations
  • Audit best courses / books

Module 8 - Data Mining and Recommenders

Module 9 - NLP and Computer Vision

NLP

Image and Computer Vision

Module 10 - Architectures / Data Centers

Electives

Resources

Reading List