/data-science-masters

Self-study plan to achieve mastery in data science

Data Science Masters

Self-study plan to achieve zero to mastery in data science.

Goals

  • Score top 20% in Kaggle competitions
  • Expert with different data types (text, image, audio, video)
  • Expert with different techniques (regression, SVM, deep learning, genetic algorithms, etc)
  • Familiar with modern tooling (python, pandas, scikit, R, tensorflow, apache spark, etc)
  • Expert with various problems (classification, search, clustering, prediction, recommendation, etc)
  • fundamentals (able to read and implement technical papers)
  • building at scale pipelines / architectures

Study Plan Overview

  • Module 0 - Highschool Math
  • Module 1 - College Math I (Calculus)
  • Module 2 - College Math II (Linear Algebra)
  • Module 3 - College Math III (Discrete Math)
  • Module 4 - College Math IV (Probability and Statistics)
  • Module 5 - Computation and Algorithms
  • Module 6 - Artificial Intelligence and Machine Learning
  • Module 7 - Deep Learning
  • Module 8 - Data Mining and Recommenders
  • Module 9 - NLP and Computer Vision
  • Module 10 - Cloud Computing Architectures / Data Center Engineering

It is recommended to look ahead so long as the general trend is that of finishing earlier blocks before later blocks.

Module 0 - Highschool math

Not everyone was lucky enough to have a good start with math growing up. The goal is to level the playing field - by the end of Block 0 you should feel like you went to a highschool with world class teachers and finished top of your math class.

Algebra

Geometry

Pre Calculus

Statistics and Probability

Required Reading

Module 1 - College Math I (Calculus)

Supplementary Material

Module 2 - College Math II (Linear Algebra)

Required Reading

Supplementary Material

Module 3 - College Math III (Discrete Math)

Proofs

Proofs, Set theory, propositional logic, induction, invariants, state-machines

Graph Theory

Combinatorics

Series, Sequences, Recurrences

Number Theory

Discrete Probability

Abstract Algebra

Supplementary Material

Module 4 - College Math IV (Probability and Statistics)

Module 5 - Computation and Algorithms

R Programming

Information Theory

Python and Computation and Data

Algorithms

  1. Introduction
    • Algorithmic thinking, peak finding
    • Models of computation, Python cost model, document distance

Resources πŸ“š Introduction to Algorithms (CLRS)

Module 6 - Artificial Intelligence and Machine Learning

https://www.coursera.org/specializations/aml

Artificial Intelligence

Machine Learning

Machine Learning Specialization by University of Washington on Coursera

Module 7 - Deep Learning

Deep Learning by deeplearning.ai on Coursera

Goals:

  • different activation functions (sigmoid/tanh/relu)
  • different cost functions
  • with and without bias units
  • classification and regression problems
  • text / binary / image / recommenders
  • batch vs stochastic
  • JS, Python, PHP, Matlab, TensorFlow, SciKitLearn
  • create visualizations and blog explanations
  • Audit best courses / books

Module 8 - Data Mining and Recommenders

Module 9 - NLP and Computer Vision

NLP

Image and Computer Vision

Module 10 - Architectures / Data Centers

Electives

Projects

Recommender, chatbot, graphics simulation with AI (e.g. ball and paddle), ...

Resources

Reading List