/4680_5680_intro_uds

Materials for CRP/DESIGN 4680/5680 Introduction to Urban Data Science at Cornell AAP

Primary LanguageJupyter NotebookMIT LicenseMIT

Binder

CRP and DESIGN 4580/5680: Introduction to Urban Data Science

Spring 2023
Professor Wenfei Xu (wenfeixu@cornell.edu)
Class details in Cornell Classes page
Office Hours: Mondays 1 – 2:30 pm, Wednesdays 11 – 12:30 pm in Sibley Hall 221

TA and GTRS

Zoe Wang (TA) zw553@cornell.edu
Office hours: Monday 5:30-6:30 pm in Sibley Hall 305
Yuetong Wang (GTRS) yw795@cornell.edu
Office hours: Wednesday 5- 6pm in Sibley Hall 305

Class Slack

Please sign up for our class Slack Group. This is where you can troubleshoot with us and each other. Note that you can add code blocks in your messages.

Companion courses and events:

  1. Required: Practitioners Talk Series (CRP 5000)
    Details in Cornell Classes page
    Pizza will be served
    Please contact Rachel Elmkies re238@cornell.edu for any logistical concerns.

  2. Highly recommended: Introduction to Python Workshop (Jan 24)
    Details in Cornell Classes page
    Taught by Jacob Grippin (jrg363@cornell.edu) at the Cornell Center for Social Sciences
    This workshop will help students get set up with the Jupyter notebook computing environment we will be using throughout the course as well as offer an introduction to Python.

Course Description

Urban data science is an emergent practice in geography and urban planning that combines: 1) the set of data analysis tools and methods used to understand a wide array of big data and big spatial data sources and, 2) questions of urban development, structure, complexity, theory, policy, dynamics, and outcomes. These approaches enable more spatiotemporally dynamic and granular analyses of cities and allow researchers new insight into urban dynamics.

This course will provide a toolkit to speak through data, code, statistics, and visualization. Using open-source data and computational tools in Python and the Jupyter Notebook environment, we will learn how to design testable research questions, collect and prepare data, apply relevant analytical techniques, present our process and results in an engaging and informative way, and identify the limitations of quantitative analysis. A personal laptop will be required.

Learning Objectives and Outcomes

The goal of this course is to provide an introduction to a wide range of tools and concepts that will enable future, deeper exploration of urban dynamics. In other words, this course provides a “sampler” of the foundations of urban data science through coding, statistical analysis, visualization/narrative-building, and critique.

The core learning objectives are:

  1. Use code to clean, analyze, and visualize spatial data
  2. Implement a descriptive or predictive analysis using appropriate data and statistical and/or computational methods
  3. Clearly communicate your process and results as a data narrative through visualizations, context, textual description, and presentation
  4. Identify the limitations and potential biases in the data, data-generating processes, and tools and methods in addressing your research topic

Class Structure

Weeks 1-13: Every class will include a lecture with embedded code snippets to follow along to and a brief exercise at the end of the class to practice the concept you just learned. We will also have six Practitioners Talk Series talks from 4:30 – 5:30 pm on some Mondays. There are four coding homeworks during this period

Weeks 14-16: The last three weeks of the class will be devoted to a final project of your choosing that addresses an urban development question. The aim is to synthesize and further develop the skills you have learned throughout the course of the semester. The proposals for the project will be due Week 10. In class for Weeks 14 and 15, each project group or individual will meet with me to discuss your progress, technical or conceptual roadblocks, and next steps. Our last two classes of the semester will be devoted to final presentations. Your final projects will be due on May 19 at 11:59 pm.

Attendance

If you need to miss class, please let me know beforehand. Arriving at class 15 minutes or later counts as an absence. Three unexcused absences from either day of class will result in lowering one letter grade (for ex: A to B, B- to C-). Five or more unexcused absences will result in failure in the course. To avoid failing the course, I suggest you withdraw rather than receive an F. If you have to miss class for any reason, please get the notes from class from a classmate. If you miss class, I still expect you to complete the in-class exercise.

Course Prerequisites

This course is designed for masters students and upper class undergraduate students. Course 5080 (Introduction to GIS) is high recommended for the course. We will not be covering any of the basics of GIS in this course. Additionally, I assume some basic statistics knowledge (mean, median, mode) and some familiarity using a spreadsheet software (Excel, Google spreadsheets). Prior or concurrent coursework in quantitative methods, statistics, visualization, and object oriented programming is recommended.

A note about learning to code

Coding is an iterative process involving (a lot of) trial and error, patience, self-direction, and clever Googling. It can seem daunting and intractable at first. Here are some guidelines and resources to help you through this process:

  1. Look for typos in the code.

  2. Search for the issue on Google. This will often lead you to sites such as Stack Overflow or Medium, which provides code snippets and sometimes step-by-step instructions on how to resolve your question. Try to be specific in your search. Do not be afraid to sound silly. My search generally involves the following keywords:

    • [language or tool] ex: “Python”, “Pandas”, “Matplotlib”
    • [function or action] ex: “plt.subplots”, “plotting multiple plots in one figure”
    • [error or issue] ex: “plots are tiny”, “not showing all plots”, etc.
  3. If trying to implement a fairly standard process, look through our class notebooks or the readings. There are often code snippets for reference there.

  4. Ask classmates in our Slack group.

  5. If none of the above is fruitful, message me or [the TA] with the specific task you are trying to implement and the relevant code snippet either as a screenshot or a Github Gist. Do not send code in the body of an email as rich text editors often add hidden formatting that can introduce new code

Technology

A personal laptop with permissions to install software is required for this course. Your laptop can be Mac, Windows, or Linux. We will be using entirely Free and Open Source Software (FOSS). If you attend the Intro to Python Workshop on Jan 24, you should already be set up with the software that you need in this course.

Assignments and Grading

The assignments in this course will consist of short, in-class exercises at the end of each lecture meant to give you some practice implementing the concepts covered in lecture, four homework assignments, a project proposal, and a final project including a presentation on your project. The homework assignments should be a zip file of all the relevant notebooks, datasets, and outputs for the assignment. Please check that the notebooks run without error. The final projects can either be individual submissions or submissions of groups of two. Additionally, active class participation in the classroom and the class Slack will be a part of the final grade.

Grading breakdown as follows:

  • In-class exercises: 20%
  • Homework: 30%
  • Final Project Proposal: 10%
  • Final Project: 35%
  • Class and Slack Participation: 5%

All in-class exercises are meant to be completed in class and are due at the 11:59pm the day of class. Homework assignments are due at 11:59pm on the Sundays noted below. All submissions will be through Canvas. Each late day will result in a letter grade deduction. Students are responsible for ensuring that their submissions go through in time. Submit early to avoid tech issues. Late submission of the final projects will not be accepted. Submissions will be graded not only on whether the code runs, but on the clarity of your documentation and explanation

The final research project will address an urban question using the tools and concepts from the course. It will consist of presenting the issue, its context and background, relevant data analyses and visualizations, conclusions, and limitations to the analysis. In addition to the project proposal due on March 31 at 11:59pm, the final deliverables will consist of a presentation, a well-documented Jupyter Notebook and associated datasets and an in-class presentation. The specific project prompt and grading criteria will be announced at a later date. The final project deliverables are due May 17 at 11:59pm.

Academic Integrity

By its nature, coding involves sharing and replication, especially given the availability of online resources. However, when using significant chunks (around five lines is a good rule of thumb) of repurposed code, make sure to indicate the source in a comment, tailor the code for your specific needs, and be prepared in class to explain how the code works.

Students are expected to follow Cornell University’s Code of Academic Integrity. Violations of the Code such as plagiarism (from any source, including fellow classmates) can result in failure or even expulsion from Cornell. Group work should summarize each student’s contribution.

Disabilities and Health

If you need a disability-related adjustment in the course, please meet with Student Disability Services (SDS) and provide me an accommodation letter. We can also meet in private to discuss adjustments to the course you may need. Also, know that Cornell has resources for mental health for anyone who may need it.

If you test positive for COVID, please submit your status to Daily Check. This will trigger an accommodation period. If students are still struggling with the impacts of long COVID or health issues, you are again encouraged to get an accommodation letter from SDS, send this, and discuss accommodations with me.

Class schedule

Week Monday Wednesday
1 Jan 23
Introductions and Course Overview
Read over the syllabus together
Introduce ourselves
Open science and the modern urban data science software stack

Practitioner Talk Series: Siqi Zhu
Jan 25
Data and code management
Pandas dataframes and good coding practices
2 Jan 30
Geospatial data in Python 1

Analyzing data with shapely

Practitioner Talk Series: Chris Whong
Feb 1
Geospatial data in Python 2

GeoPandas and GIS basics in Python
3 Feb 6 Data prep 1
Data wrangling, cleaning, and error management with pandas

Practitioner Talk Series: Nathan Storey
Feb 8 Data prep 2
Linking datasets, overlaying and aggregating data, re-classifying data with pandas
4 Feb 13 Data exploration 1
Mapping with geopandas, contextily, and Kepler

Practitioner Talk Series: Shan He
Feb 15 Data exploration 2
Descriptive statistics and visualization with matplotlib and seaborn

HW 1 Due Feb 19 at 11:59pm
5 Feb 20 Data acquisition 1
Open data portals and APIs with Google maps, cenpy, and Socrata
Feb 22 Data acquisition 1
Web-scraping with beautifulsoup
6 Feb 27
NO CLASS – FEBRUARY BREAK
Mar 1 Spatial data analysis 1
Spatial weights with pysal
7 Mar 6 Spatial data analysis 2
Spatial autocorrelation with pysal

Practitioner Talk Series: Mario Giampieri
Mar 8 Spatial data analysis 3
Point pattern analysis

HW 2 Due Mar 12 at 11:59pm
8 Mar 13 Unsupervised learning 1
Dimensionality reduction and K-means clustering with scikit-learn

Practitioner Talk Series: Michelle Ho
Mar 15 Unsupervised learning 2
Spatial clustering and regionalization
9 Mar 20 Regression 1
Linear regression with statsmodels and scikit-learn
Mar 22
No lecture (Yuetong review day?) – AAG


HW 3 Due Mar 26 at 11:59pm
10 Mar 27 Regression 2
Spatial regression with pysal
Mar 29 Supervised learning 1
Classification with scikit-learn


Project Proposal Due Mar 31 at 11:59pm
11 Apr 3
NO CLASS – SPRING BREAK
Apr 5
NO CLASS – SPRING BREAK
12 Apr 10 Supervised learning 2
Ensemble learning with decision trees and random forest models with scikit-learn
Apr 12 Supervised learning 3
Regression vs classification, model selection, bias-variance tradeoff, and cross-validation with scikit-learn

HW 4 Due Apr 16 at 11:59pm
13 Apr 17 Special topics 1
OpenStreetMap with OSMnx
Apr 19 Special topics 2
Spatial Inequality
14 Apr 24
In-class work and one-on-one final projects meetings with Wenfei
Apr 26
No lecture (Yuetong review day?) – UAA
15 May 1
In-class work and one-on-one final projects meetings with Wenfei
May 3
Final Project Presentations 1
16 May 9
Final Project Presentations 2
Final project materials due May 17 at 11:59pm