STAT5206 - Spring 2023
This course is designed for beginners who don't have any experience in comptuer programming languages.
- Understand basic Python programming concepts
- Data types, Functions, Control flow, For loop
- Data acquistion
- Download data using API and SQL query
- Data wrangling for statistical analysis
- from Hierarchical dataset to Tabular dataset
- Data visualization
- Scatterplot, Histogram, Boxplot
- Basic statistical analysis
- Linear model, Simple optimization
- An introductory statistics class
- Basic probability distributions (e.g. Gaussian, binomial distributions and their likelihoods)
- Basic hypothesis testing (e.g. t-test)
- Summary statistics
- Histograms, boxplots, etc
- Multivariate calculus
- Derivatives and functions
- Matrix operations and inverses of matrices
- You should be at least co-enrolled in a modeling class like regression
- Google!
- Basics only - Programming with Python by Software Carpentry (PPSC)
- Python concept notes (PCN)
- Python Data Science Handbook (PDSH)
- LearningPython.org (LP)
Date | Topic | Reference | Due |
---|---|---|---|
Week 1 | - Introduction - Python 101-1 (Variable) |
- PCN Chapter 1-10 - PPSC Chapter 1, 2 - PDSH Chapter 2 |
|
Week 2 | - Python 101-2 (Function, Package, Loop, if/else, File I/O) - N-gram |
- PCN Chapter 1-10 | |
Week 3 | - Numpy - OLS |
- PCN Chapter 11 | - HW1 (February 5, 11:59 PM) |
Week 4 | - AB testing - Pandas 1 (DataFrame) |
- PCN Chapter 12, 14 | - HW2 (February 15, 11:59 PM) |
Week 5 | - Pandas 2 (Grouping, Merge, Timestamp) - COVID-19 |
- PCN Chapter 12, 15 | |
Week 6 | - Visualization (matplotlib, seaborn) | - PCN Chapter 16 | - HW3 (February 26, 11:59 PM) |
Week 7 | - NYTimes - Review |
||
Week 8 | Midterm (in class, March 6) | ||
Week 9 | Spring Recess NO CLASS | ||
Week 10 | - Midterm review - Data formats - Regular expression 1 |
||
Week 11 | - Regular expression 2 |
- PCN Chapter 13 | |
Week 12 | - Interacting with APIs - NYC311 |
- PCN Chapter 17 | - HW 4 (April 7, 11:59 PM) |
Week 13 | - SQL - Internet speed |
- PCN Chapter 20 | |
Week 14 | - Linear Model (Feature Engineering, Data Splitting, Cross-validation) - Medical Insurance |
- PCN Chapter 18 - PDSH Chapter 5 |
- HW 5 (April 23, 11:59 PM) |
Week 15 | - Regularization | - PCN Chapter 19 | |
Week 16 | - No class | ||
Week 17 | Final (in class, May 8) |
Class time: MW 6:10PM - 7:25PM, Location: 301 Pupin Laboratories
Yongchan Kwon (yk3012 (at) columbia (dot) edu)
- Office Hours: Every Friday 10:00 AM - 12:00 PM at 901-C, School of Social Work. Or by appointment.
Maria-Cristiana Girjau (mg4345 (at) columbia (dot) edu)
- Office hours: Every Thursday 9:00 AM - 11:00 AM at the 10th floor lounge, School of Social Work.
- Late homeworks will receive 0 credit
- No make-up homeworks will be granted even if you registered late to the class
- Please read these important things related to submitting homeworks on Ed
- Midterms (30%)
- Final (45%)
In order to receive disability-related academic accommodations for this course, students must first be registered with their school Disability Services (DS) office. Detailed information is available online for both the Columbia and Barnard registration processes.
Refer to the appropriate website for information regarding deadlines, disability documentation requirements, and drop-in hours (Columbia)/intake session (Barnard).
For this course, students are not required to have testing forms or accommodation letters signed by faculty. However, students must do the following:
· The Instructor section of the form has already been completed and does not need to be signed by the professor.
· The student must complete the Student section of the form and submit the form to Disability Services.
· Master forms are available in the Disability Services office or online: https://health.columbia.edu/services/testing-accommodations
- Take chances!
- Break the code in lecture
- Give feedback in office hours or e-mail, don't waste your time if you think a topic is not helpful
- Participate and ask questions, this is not easy!
- In class: forecast what should be done, compare with what is happening, then summarize the difference.
- Online: describe what you observe, describe what you expect, communicate clearly.
- To each other: summarize the conversation to ensure you're listening and think constructively before criticizing.
- THE MOST IMPORTANT Academic honesty: https://www.cs.columbia.edu/education/honesty/
A lot of these materials are based off the materials from Prof. Wayne Tai Lee's STAT5206 homepage.