HarvardX Data Science Professional Certificate in R
Early assesments (courses 1-4) were mostly completed using Datacamp. Once productivity tools, like RStudio and GitHub were introduced in course 5, the scripts were completed in .R scripts.
1. R basics
- Basic R syntax, data types, vectors arithmetic, indexing, sorting, sorting using
dplyr
, and plotting using basic packages.
2. Visualization
- Data visualization principles, creating custom plots with
ggplot2
, and studying the advantages and pitfalls of widely-use plots.
3. Probability
- Probability theory concepts including the central limit theorem, random variables and independence, performing Monte Carlo simulations, and computing expected values and standard errors.
4. Inference & Modeling
- Defining parameters, estimates and standard errors, and margins of errors of populations in order to make predictions about data. Modeling aggregate data from different sources, Bayesian statistics and predictive modeling.
5. Productivity Tools
- Introduction of command line filing system, utilization of version control with git, and leveraging the powerful tools in RStudio.
6. Wrangling
- Importing data from different file formats, web scraping, tidy data with
tidyverse
, processing string with regex, wrangling data withdplyr
, handling date and time formats in data, and text mining.
7. Linear Regression
- Developing linear regression mathematically, explaining and detecting confounding, implementing linear regression to understand the relationship between variables.
8. Machine Learning
- Machine learning basics, cross-validation to avoid overtraining, using popular machine learning algorithms from the
caret
package, employing regularization when appropriate.
9. Capstone
- Applying the skills learned throughout the series to a real-world problem through an independent data analysis project. See README file in 9-Capstone folder for project descriptions.