In this course, students learn the foundational skills of data science, including data collection, scrubbing, analysis, and visualization with modern tools and libraries. Students gain a strong grounding in statistical concepts, utilize statistical techniques and master the science and art of data exploration and visualization to tell stories and persuade decision makers with data-driven insights.
By the end of this course, students will be able to...
- Use Pandas to perform data-frame processing
- Report findings in a dataset through data visualization
- Understand when to reject or accept a null hypothesis
- Use time series processing
- Describe and implement a plan for finding and dealing with null values, outliers, and other problems in a dataset
- Explain the central limit theorem and its importance in statistical analysis
- Use statistical methods to calculate a z-score and explain what the z-score means
NOTE: Due to the shorter summer sessions, for some class sessions you will see multiple topics covered. This is to ensure that we cover the same material that we normally would in non-summer terms.
Course Dates: Wednesday, May 29 – Wednesday, July 3, 2019 (6 weeks)
Class Times: Monday and Wednesday at 1:30–3:20pm (11 class sessions)
Class | Date | Topics |
---|---|---|
- | Mon, May 27 | Memorial Day |
1 | Wed, May 29 | Introduction to Data Science |
2 | Mon, June 3 | Simple Data Manipulation |
3 | Wed, June 5 | Data Manipulation & Visualization / How to Combine DataFrames |
4 | Mon, June 10 | Applied Descriptive Statistics |
5 | Wed, June 12 | Applied Probability to data frame |
6 | Mon, June 17 | NPS Project Data Wrangling Check-in |
7 | Wed, June 19 | Hypothesis Testing & Acceptable Error |
8 | Mon, June 24 | Confidence Intervals & Outliers / Statistical Analysis |
9 | Wed, June 26 | Time Series Data & Applications |
10 | Mon, July 1 | Final Exam |
11 | Wed, July 3 | Final Presentations |
- Implement a dataset processing with Numpy only and then Pandas
- Write a function that calculate conditional probability for two arbitrary attributes and arbitrary condition
Students will complete the following guided tutorials in this course:
- Exploratory Data Analysis & Visualization with App Store Dataset
- Advanced Data Analysis & Visualization with Pokémon Dataset
Students will complete the following self-guided projects in this course:
- Make School Summer Academy NPS Data Wrangling & Analysis
- Custom Project: Students will select a problem, identify data sources, analyze and present findings
To pass this course you must meet the following requirements:
- Do all in-class activities
- Finish all required tutorials and projects
- Pass the final exam (summative assessment). The topics for final exam would be:
- Null hypothesis, the steps to accept or reject it
- Statistical terms and meanings such as Z-distribution, CDF, SF, ...
- Histogram, density estimations
- Outlier detection
- Correlation