Data Science Online Resources

This is a list of data science learning materials organized by topics.


Table of Contents

1. Foundations

2. Data Science

3. Topics


Data Structure & Algorithms:

  • Problem Solving with Algorithms and Data Structures using Python (Brad Miller and David Ranum): Book
  • Grokking Algorithms (Aditya Y. Bhargava): Github
  • Introduction to Algorithms by MIT: Course page with videos
    • Python Tutor: Website
    • MSDS689 Problem Solving with Python: Github

Probability & Statistics:

  • Stanford CS109 Introduction to Probability for Computer Scientists: Course page
  • NYU 1002 Statistical and Mathematical Methods: Course page 2015
  • Harvard Statistics110: Course page
  • Think Bayes (Allen B. Downey): Book
  • Think Stats (Allen B. Downey): Book

Python

  • Python for Data Analysis (Wes McKinney):
  • Learn Python the hard way: Free book

R

  • R for Data Science (Hadley Wickham): Free book

Database

  • Database by Stanford: Course

Others

  • USF MSDS692 Data acquisition: Course page
  • USF MSDS501 Computational Data Science Bootcamp: Course page
  • COMPSCI 109A: Data Science 1: Introduction to Data Science: Course page

Machine Learning:

- Videos:

- Textbooks:

  • Introduction to Statistical Learning: pdf
  • Computer Age Statistical Inference: Algorithms, Evidence, and Data Science: pdf
  • The Elements of Statistical Learning: pdf
  • Introduction to Machine Learning with Python: github

- Comments:

Stanford Statistical Learning is a good introductory course that explains concept in simple ways and not math heavy. More advanced materials could be found in 229, 701 or ESL book.


Deep Learning

- Videos:

  • Ng’s deep learning courses: Coursera
  • Stanford CS230: Deep Learning: Couse page, Github
  • Stanford 231n: Convolutional Neural Networks for Visual Recognition (Spring 2017): Youtube, Couse page
  • Deep Learning Specialization: Youtube
  • Stanford 224n: Natural Language Processing with Deep Learning (Winter 2017): Youtube, Course page
  • MIT 6.S094: Deep Learning for Self-Driving Cars: Youtube, Couse page
  • MIT 6.S191 Introduction to Deep Learning: Youtube
  • Neural Networks for Machine Learning by Hinton: Coursera. This course is so hard for me but it covers almost everything about neural networks. Prof. Hinton is the hero.
  • Keras in 30 sec: Link
  • Tensorflow. Stanford CS20SI: Youtube

- Books:

- Other:

- Comments:

Ng's courses are already good enough. Reading Part 2 of Goodfellow's book can also be helpful. Learning one kind of DL packages is important, such as Keras, TF or Pytorch. People may choose a focus, either CV or NLP. People want to have deeper understanding of DL can take Hinton's course and read Part 3 of Goodfellow's book. Fast.ai has very practical courses.


Reinforcement Learning:

- Videos:

- Books:

  • Reinforcement Learning: An Introduction (2nd): pdf

Artificial Inteligence

I seperated AI from DL/RL here since there are some specialized AI courses. But the content of AI and DL overlappes with each other a lot.

- Courses:

  • FAST.ai: Course. A practical deep learning course.
  • UC Berkeley CS188 Intro to AI: Course page
  • CMU 681 Artificial Intelligence by Tuomas Sandholm: Course page

- Comments:


Bayesian:

- Courses:

  • Bayesian Statistics: From Concept to Data Analysis: Coursera
  • Bayesian Methods for Machine Learning: Coursera
  • Statistical Rethinking: Course Page (Recorded Lectures: Winter 2015, Fall 2017)

- Book:

  • Bayesian Reasoning and Machine Learning: Online book
  • Bayesian Data Analysis, Third Edition
  • Applied Predictive Modeling

Time series:

- Courses:

- Books:

  • Time Series Analysis and Its Applications: Springer

- With LSTM:


Natural Language Processing:

- Videos:

- Books:

  • Natural Language Processing with Python: Book
  • Speech and Language Processing (3rd ed. draft): Book
  • An Introduction to Information Retrieval: pdf
  • Deep Learning (Some chapters or sections): Book
  • A Primer on Neural Network Models for Natural Language Processing: Paper. Goldberg also published a new book this year

- Packages:

- Comments:

The basic NLP course by Stanford is the fundamental one. SLP 3ed follows this course. After this, feel free to take one of the three NLP+DL courses. They basically cover same topics. The Stanford one have HWs available online. CMU one follows Goldberg's book. Deepmind one is much shorter.

- More:

Some other people's collections: NLP DL-NLP Speech and NLP Speech RNN


Distributed Systems:

- Courses:

  • Stanford CS246: Mining Massive Dataset: Course
  • Stanford CS345A: Data Mining: Course
  • CU W4121 Computer Systems for Data Science: Course
  • NYU Big Data: Course, Wiki
  • The Ultimate Hands-On Hadoop: Udemy
  • Spark and Python for Big Data with PySpark: Udemy

- Books:

  • Mining of Massive Dataset: book
  • Data-Intensive Text Processing with MapReduce: book

Cryptography

- Courses:

  • Stanford CS255: Introduction to Cryptography: Course
  • Stanford CS 355: Topics in Cryptography: Course

- Books:

A Graduate Course in Applied Cryptography: book


Analytics:


Communications:

- Lists with Solutions:

  • 111 Data Science Interview Questions & Detailed Answers: Link
  • 40 Interview Questions asked at Startups in Machine Learning / Data Science Link
  • 100 Data Science Interview Questions and Answers (General) for 2017 Link
  • 21 Must-Know Data Science Interview Questions and Answers Link
  • 45 Questions to test a data scientist on basics of Deep Learning (along with solution) Link
  • 30 Questions to test a data scientist on Natural Language Processing Link
  • Questions on Stackoverflow: Link
  • Compare two models: My collection

- Without Solutions:

  • Over 100 Data Science Interview Questions Link
  • 20 questions to detect fake data scientists Link
  • Question on Glassdoor: link

Quant:

- Courses:

- Other:

  • A Collection of Dice Problems: pdf

More:

Github search CS video lectures Explore Course