This repository stores code and output from the assignments supporting the Johns Hopkins University
Data Science Specialization that is taught via Coursera, but in Python rather than R. It's been over 10 years since the
Specialization was originally released. The programming language used in the specialization is R,
and as of January 2025, Python is overtaking R as the data science language of choice for
organizations that are not bound to a legacy of commercial statistics packages like SAS or SPSS.
when I participated in the JHU Specialization back in 2015 I liked the format of the classes because of their practicality. That is, the structure of the courses starting with The Data Scientist's Toolbox enables students to build an increasingly diverse set of core data science skills. Therfore, I thought it would be useful as a framework to organize my learning of Python.
In contrast, I have found training and documentation on Python to be more focused on the underlying theory of the language, which has its advantages but also creates frustration if you're trying to learn "just enough" Python to get a job done.
The Hopkins Data Science Specialization includes a total of ten courses, ranging from the introductory The Data Scientist's Toolbox to the Data Science Specialization Capstone that includes a challenging natural language programming project. Using the curriculum to build a library of Python code to solve various data science problems will be useful as a reference for data scientists who use both R and Python.
To be continued...