An introduction to the data science pipeline, i.e., the end-to-end process of going from unstructured, messy data to knowledge and actionable insights. Provides a broad overview of several topics including statistical data analysis, basic data mining and machine learning algorithms, large-scale data management, cloud computing, and information visualization.
All projects in this repo are the projects I completed as my own submissions.
Each project has a corresponding html, ipynb, and pdf file. It is recommended that you open and run either the html or ipynb file for best formatting, but the pdf is supplied for your convenience.
Project 1 - NASA Flares: 82.5/85 (Partial points in part 1 by specifying N-th index for web request)
Project 2 - Moneyball: 100/100
Project 3 - Gapminder: 96/125 (Incorrect questions 2 and 11, partial points on 6 and 12, and did not complete part 2 cross-validation and write-up)
Project 4 - Maps: 75/100 (100% on completed sections)
Final Project - Electricity Cost: 75/100 (100% on completed sections)