/DS-1.1-Data-Analysis

DS 1.1: Data Analysis & Visualization

Primary LanguageJupyter Notebook

DS 1.1: Data Analysis & Visualization

Course Description

In this course, students learn the foundational skills of data science, including data collection, scrubbing, analysis, and visualization with modern tools and libraries. Students gain a strong grounding in statistical concepts, utilize statistical techniques and master the science and art of data exploration and visualization to tell stories and persuade decision makers with data-driven insights.

Prerequisites:

CS 1.2

Learning Outcomes

By the end of this course, students will be able to...

  1. Use Pandas to perform data-frame processing
  2. Report findings in a dataset through data visualization
  3. Understand when to reject or accept a null hypothesis
  4. Use time series processing
  5. Describe and implement a plan for finding and dealing with null values, outliers, and other problems in a dataset
  6. Explain the central limit theorem and its importance in statistical analysis
  7. Use statistical methods to calculate a z-score and explain what the z-score means

Schedule

NOTE: Due to the shorter summer sessions, for some class sessions you will see multiple topics covered. This is to ensure that we cover the same material that we normally would in non-summer terms.

Course Dates: Wednesday, May 29 – Wednesday, July 3, 2019 (6 weeks)

Class Times: Monday and Wednesday at 1:30–3:20pm (11 class sessions)

Class Date Topics
- Mon, May 27 Memorial Day
1 Wed, May 29 Introduction to Data Science
2 Mon, June 3 Simple Data Manipulation
3 Wed, June 5 Data Manipulation & Visualization
/
How to Combine DataFrames
4 Mon, June 10 Applied Descriptive Statistics
5 Wed, June 12 Applied Probability to data frame
6 Mon, June 17 NPS Project Data Wrangling Check-in
7 Wed, June 19 Hypothesis Testing & Acceptable Error
8 Mon, June 24 Confidence Intervals & Outliers
/
Statistical Analysis
9 Wed, June 26 Time Series Data & Applications
10 Mon, July 1 Final Exam
11 Wed, July 3 Final Presentations

Class Assignments

  • Implement a dataset processing with Numpy only and then Pandas
  • Write a function that calculate conditional probability for two arbitrary attributes and arbitrary condition

Tutorials

Students will complete the following guided tutorials in this course:

Projects

Students will complete the following self-guided projects in this course:

Evaluation

To pass this course you must meet the following requirements:

  • Do all in-class activities
  • Finish all required tutorials and projects
  • Pass the final exam (summative assessment). The topics for final exam would be:
    • Null hypothesis, the steps to accept or reject it
    • Statistical terms and meanings such as Z-distribution, CDF, SF, ...
    • Histogram, density estimations
    • Outlier detection
    • Correlation

Make School Course Policies