Phase 1
Phase 1 Topic 01 - Getting Started with Data Science
Phase 1 Topic 02 - Bash and Git
Phase 1 Topic 03 - Control Flow, Functions, and Statistics
- Python
- Coding Conventions
Phase 1 Topic 04 - Python Libraries: Numpy and Pandas
Phase 1 Topic 05 - Data Cleaning in Pandas
- Data Cleaning
- Aggregation
Phase 1 Topic 06 - Data Visualization
- Warmup
- Data Visualization
Phase 1 Topic 07 - SQL and Relational Databases & Phase 1 Topic 08 - Other Database Structures
Module 1
Module 1 Section 01 - Getting Started with Data Science
- Python
- Coding Conventions
Module 1 Section 02 - Bash and Git
- Bash Shell (Command Line Interface)
- Git & GitHub
- Activities
- Extras for Using Git
Module 1 Section 03 - Control Flow, Functions, and Statistics
- Control Flow
- Functions
- Statistics
Module 1 Section 04 - Python Libraries: NumPy and Pandas
Module 1 Section 05 - Data Cleaning in Pandas
- Pandas & Data
- Data Exploration & Cleaning
Module 1 Section 06 - Data Visualization
- Data Visualization Intro
- Good & Bad Visualizations
Module 1 Section 07 - SQL and Relational Databases
- Introduction to SQL
- More SQL
Module 1 Section 08: Other Database structures
Module 1 Section 09: JSON and APIs
Module 1 Section 10: HTML, CSS, and Web Scraping
Module 1 Project: Movie Analysis
Module 2
Module 2 Section 11 - Combinatorics and Probability
- Conditional Probability
- Combinatorics
Module 2 Section 12 - Statistical Distributions
- Statistical Distributions
Module 2 Section 13 - Central Limit Theorem and Confidence Intervals
- Central Limit Theorem
- Confidence Intervals
Module 2 Section 14 - Hypothesis Testing
- Experiment Design
- Considerations
- Statistical Tests
- t-Tests
Module 2 Section 15 - Statistical Power & ANOVA
- Parts of Hypothesis Tests
- Welch's t-test & ANOVA
Module 2 Section 16 - A/B Testing
Module 2 Section 17 - Bayesian Statistics
Module 2 Section 18 - Introduction to Linear Regression
Module 2 Section 19 - Multiple Linear Regression
- Multiple Linear Regression
Module 2 Section 20 - Extensions to Linear Regression
- Polynomial & Interacting Terms
Module 3
Module 3 Section 17 - Combinatorics
Module 3 Section 18 - Statistical Distributions
Module 3 Section 19 - Central Limit Theorem
- Central Limit Theorem
- Sampling Statistics
- Confidence Intervals
Module 3 Section 20 - Hypothesis Testing
- Intro to Experimental Design
- P-Values & Null Hypothesis
- Effect Sizes
- T-Tests
- Type 1 & Type 2 Errors
Module 3 Section 21 - Statistical Power & ANOVA
- Statistical Power
- Welch's T-Test
- Multiple Comparisons & Goodhart's Law
- ANOVA
Module 3 Section 22 - AB Testing
Module 3 Section 23 - Bayesian Statistics
- Bayes Theorem
- Naive Bayes
Module 3 Section 24 - Resampling and Monte Carlo Simulation
- Data Generation
- Resampling
- Monte Carlo
Module 4
Module 4 Section 25 - A Complete Data Science Project Using Multiple Regression
Module 4 Section 26 - Linear Algebra
- Linear Algebra Intro
- Math with Tensors
- Solving With Linear Algebra
Module 4 Section 27 - Calculus, Cost Function, and Gradient Descent
Derivatives
- derivatives.ipynb
- Gradient Descent
- Gradient Descent Walkthrough
Module 4 Section 28 - Extensions to Linear Models
- Improving Linear Regression (Interactions & Polynomial)
- Regularization
- Bias & Variance
Module 4 Section 29 - Introduction to Logistic Regression
- Logistic Regression Intro
- Logistic Regression
- Evaluation Metrics (Confusion Matrices)
- Evaluation Curves (ROC & AUC)
Module 4 Section 30 - In-depth Logistic Regression
Module 4 Section 31 - Working with Time Series Data
- Time Series Intro
- Time Series Visualization
- Time Series Trends
Module 4 Section 32 - Time Series Modeling
- Time Series Models Intro
- ARMA Model
Module 5
Module 5 Section 33 - K Nearest Neighbors
- Distance Metrics
- K Nearest Neighbors
Module 5 Section 34 - Decision Trees
Module 5 Section 35 - Ensemble Methods
- Ensemble Methods (Bagging, Random Forest, Adaboost, Gradient Boosting)
Module 5 Section 36 - Support Vector Machines
- Support Vector Machine Intro
- Kernel Trick
Module 5 Section 37 - Principal Component Analysis
- Dimensionality
- Principal Component Analysis
Module 5 Section 38 - Clustering
- K-means
- Hierarchical Clustering
- DBSCAN
Module 5 Section 39 - Building a Machine Learning Pipeline
Module 5 Section 40 - Big Data in PySpark
- Big Data Introduction
- Distributed Computing
- MapReduce
Module 5 Section 41 - Recommendation Systems
- Recommendation Systems
- Neighbor Memory Based Collab Filtering
- Matrix Factorization
Module 6
Module 6 Section 42 - Graph Theory
Module 6 Section 43 - Foundations of Natural Language Processing
Module 6 Section 44 - Introduction to Deep Learning
Module 6 Section 45 - Multi-Layer Perceptrons
Module 6 Section 46 - Tuning Neural Networks
Moduel Section 49 - Deep NLP - Word Embeddings