DS 1.1: Data Analysis & Visualization
Course Description
In this course, students learn the foundational skills of data science, including data collection, scrubbing, analysis, and visualization with modern tools and libraries. Students gain a strong grounding in statistical concepts, utilize statistical techniques and master the science and art of data exploration and visualization to tell stories and persuade decision makers with data-driven insights.
Prerequisites:
Learning Outcomes
By the end of this course, students will be able to...
- Conduct data manipulation and visualization
- Understand when to reject or accept a null hypothesis
- Apply descriptive statistics, probability, and other forms of data analysis techniques
- Describe and implement a plan for finding and dealing with problems in a dataset such as null values and outliers
- Perform statistical analysis on data collections using a variety of methods
Schedule
Course Dates: Tuesday, January 21 – Thursday, March 5, 2020 (7 weeks)
Class Times: Tuesday and Thursday at 3:30–5:20pm (14 class sessions)
Class | Date | Topics |
---|---|---|
1 | Tue, January 21 | Introduction to Data Science |
2 | Thu, January 23 | Simple Data Manipulation |
3 | Tue, January 28 | Data Manipulation & Visualization |
4 | Thu, January 30 | How to Combine DataFrames |
5 | Tue, February 4 | Applied Descriptive Statistics |
6 | Thu, February 6 | Applied Probability to data frame |
7 | Tue, February 11 | [NPS Project Data Wrangling Check-in] |
8 | Thu, February 13 | PDFs, CDFs, and Normal Distributions |
9 | Tue, February 18 | Hypothesis Testing & Acceptable Error |
10 | Thu, February 20 | Confidence Intervals, Outliers, and Statistical Analysis |
11 | Tue, February 25 | Time Series Data & Applications |
12 | Thu, February 27 | [Lesson 12] |
13 | Tue, March 3 | Final Exam |
14 | Thu, March 5 | Presentations |
Assignment Schedule
[INSTRUCTOR NOTE] REPLACE THE BELOW WITH LINKS TO YOUR ASSIGNMENTS, CORRECT DATES, AND SUBMISSION FORM
Assignment | Date Assigned | Due Date | Submission Form |
---|---|---|---|
Midterm Project - NPS Data Analysis | Thu, January 30 | Tue, February 11 | Submit Assignment |
Homework 1 - Histogram | Thu, February 13 | Thu, February 20 | Submit Assignment |
Link to Assignment | day, Date | day, Date | Submit Assignment |
Link to Assignment | day, Date | day, Date | Submit Assignment |
Class Assignments
- Implement a dataset processing with Numpy only and then Pandas
- Write a function that calculate conditional probability for two arbitrary attributes and arbitrary condition
Tutorials
Students will complete the following guided tutorials in this course:
- Exploratory Data Analysis & Visualization with App Store Dataset
- Advanced Data Analysis & Visualization with Pokémon Dataset
Projects
Students will complete the following self-guided projects in this course:
- Make School Summer Academy NPS Data Wrangling & Analysis
- Custom Project: Students will select a problem, identify data sources, analyze and present findings
Evaluation
To pass this course you must meet the following requirements:
- Do all in-class activities and one homework
- Finish all required tutorials and two projects
- Pass the final exam (summative assessment). The topics for final exam would be:
- Null hypothesis, the steps to accept or reject it
- Statistical terms and meanings such as Z-distribution, CDF, SF, ...
- Histogram, density estimations
- Outlier detection
- Correlation