Spring 2023
Professor Wenfei Xu (wenfeixu@cornell.edu)
Class details in Cornell Classes page
Office Hours: Mondays 1 – 2:30 pm, Wednesdays 11 – 12:30 pm in Sibley Hall 221
Zoe Wang (TA) zw553@cornell.edu
Office hours: Monday 5:30-6:30 pm in Sibley Hall 305
Yuetong Wang (GTRS) yw795@cornell.edu
Office hours: Wednesday 5- 6pm in Sibley Hall 305
Please sign up for our class Slack Group. This is where you can troubleshoot with us and each other. Note that you can add code blocks in your messages.
-
Required: Practitioners Talk Series (CRP 5000)
Details in Cornell Classes page
Pizza will be served
Please contact Rachel Elmkies re238@cornell.edu for any logistical concerns. -
Highly recommended: Introduction to Python Workshop (Jan 24)
Details in Cornell Classes page
Taught by Jacob Grippin (jrg363@cornell.edu) at the Cornell Center for Social Sciences
This workshop will help students get set up with the Jupyter notebook computing environment we will be using throughout the course as well as offer an introduction to Python.
Urban data science is an emergent practice in geography and urban planning that combines: 1) the set of data analysis tools and methods used to understand a wide array of big data and big spatial data sources and, 2) questions of urban development, structure, complexity, theory, policy, dynamics, and outcomes. These approaches enable more spatiotemporally dynamic and granular analyses of cities and allow researchers new insight into urban dynamics.
This course will provide a toolkit to speak through data, code, statistics, and visualization. Using open-source data and computational tools in Python and the Jupyter Notebook environment, we will learn how to design testable research questions, collect and prepare data, apply relevant analytical techniques, present our process and results in an engaging and informative way, and identify the limitations of quantitative analysis. A personal laptop will be required.
The goal of this course is to provide an introduction to a wide range of tools and concepts that will enable future, deeper exploration of urban dynamics. In other words, this course provides a “sampler” of the foundations of urban data science through coding, statistical analysis, visualization/narrative-building, and critique.
The core learning objectives are:
- Use code to clean, analyze, and visualize spatial data
- Implement a descriptive or predictive analysis using appropriate data and statistical and/or computational methods
- Clearly communicate your process and results as a data narrative through visualizations, context, textual description, and presentation
- Identify the limitations and potential biases in the data, data-generating processes, and tools and methods in addressing your research topic
Weeks 1-13: Every class will include a lecture with embedded code snippets to follow along to and a brief exercise at the end of the class to practice the concept you just learned. We will also have six Practitioners Talk Series talks from 4:30 – 5:30 pm on some Mondays. There are four coding homeworks during this period
Weeks 14-16: The last three weeks of the class will be devoted to a final project of your choosing that addresses an urban development question. The aim is to synthesize and further develop the skills you have learned throughout the course of the semester. The proposals for the project will be due Week 10. In class for Weeks 14 and 15, each project group or individual will meet with me to discuss your progress, technical or conceptual roadblocks, and next steps. Our last two classes of the semester will be devoted to final presentations. Your final projects will be due on May 19 at 11:59 pm.
If you need to miss class, please let me know beforehand. Arriving at class 15 minutes or later counts as an absence. Three unexcused absences from either day of class will result in lowering one letter grade (for ex: A to B, B- to C-). Five or more unexcused absences will result in failure in the course. To avoid failing the course, I suggest you withdraw rather than receive an F. If you have to miss class for any reason, please get the notes from class from a classmate. If you miss class, I still expect you to complete the in-class exercise.
This course is designed for masters students and upper class undergraduate students. Course 5080 (Introduction to GIS) is high recommended for the course. We will not be covering any of the basics of GIS in this course. Additionally, I assume some basic statistics knowledge (mean, median, mode) and some familiarity using a spreadsheet software (Excel, Google spreadsheets). Prior or concurrent coursework in quantitative methods, statistics, visualization, and object oriented programming is recommended.
Coding is an iterative process involving (a lot of) trial and error, patience, self-direction, and clever Googling. It can seem daunting and intractable at first. Here are some guidelines and resources to help you through this process:
-
Look for typos in the code.
-
Search for the issue on Google. This will often lead you to sites such as Stack Overflow or Medium, which provides code snippets and sometimes step-by-step instructions on how to resolve your question. Try to be specific in your search. Do not be afraid to sound silly. My search generally involves the following keywords:
- [language or tool] ex: “Python”, “Pandas”, “Matplotlib”
- [function or action] ex: “plt.subplots”, “plotting multiple plots in one figure”
- [error or issue] ex: “plots are tiny”, “not showing all plots”, etc.
-
If trying to implement a fairly standard process, look through our class notebooks or the readings. There are often code snippets for reference there.
-
Ask classmates in our Slack group.
-
If none of the above is fruitful, message me or [the TA] with the specific task you are trying to implement and the relevant code snippet either as a screenshot or a Github Gist. Do not send code in the body of an email as rich text editors often add hidden formatting that can introduce new code
A personal laptop with permissions to install software is required for this course. Your laptop can be Mac, Windows, or Linux. We will be using entirely Free and Open Source Software (FOSS). If you attend the Intro to Python Workshop on Jan 24, you should already be set up with the software that you need in this course.
The assignments in this course will consist of short, in-class exercises at the end of each lecture meant to give you some practice implementing the concepts covered in lecture, four homework assignments, a project proposal, and a final project including a presentation on your project. The homework assignments should be a zip file of all the relevant notebooks, datasets, and outputs for the assignment. Please check that the notebooks run without error. The final projects can either be individual submissions or submissions of groups of two. Additionally, active class participation in the classroom and the class Slack will be a part of the final grade.
Grading breakdown as follows:
- In-class exercises: 20%
- Homework: 30%
- Final Project Proposal: 10%
- Final Project: 35%
- Class and Slack Participation: 5%
All in-class exercises are meant to be completed in class and are due at the 11:59pm the day of class. Homework assignments are due at 11:59pm on the Sundays noted below. All submissions will be through Canvas. Each late day will result in a letter grade deduction. Students are responsible for ensuring that their submissions go through in time. Submit early to avoid tech issues. Late submission of the final projects will not be accepted. Submissions will be graded not only on whether the code runs, but on the clarity of your documentation and explanation
The final research project will address an urban question using the tools and concepts from the course. It will consist of presenting the issue, its context and background, relevant data analyses and visualizations, conclusions, and limitations to the analysis. In addition to the project proposal due on March 31 at 11:59pm, the final deliverables will consist of a presentation, a well-documented Jupyter Notebook and associated datasets and an in-class presentation. The specific project prompt and grading criteria will be announced at a later date. The final project deliverables are due May 17 at 11:59pm.
By its nature, coding involves sharing and replication, especially given the availability of online resources. However, when using significant chunks (around five lines is a good rule of thumb) of repurposed code, make sure to indicate the source in a comment, tailor the code for your specific needs, and be prepared in class to explain how the code works.
Students are expected to follow Cornell University’s Code of Academic Integrity. Violations of the Code such as plagiarism (from any source, including fellow classmates) can result in failure or even expulsion from Cornell. Group work should summarize each student’s contribution.
If you need a disability-related adjustment in the course, please meet with Student Disability Services (SDS) and provide me an accommodation letter. We can also meet in private to discuss adjustments to the course you may need. Also, know that Cornell has resources for mental health for anyone who may need it.
If you test positive for COVID, please submit your status to Daily Check. This will trigger an accommodation period. If students are still struggling with the impacts of long COVID or health issues, you are again encouraged to get an accommodation letter from SDS, send this, and discuss accommodations with me.
Week | Monday | Wednesday |
---|---|---|
1 | Jan 23 Introductions and Course Overview Read over the syllabus together Introduce ourselves Open science and the modern urban data science software stack Practitioner Talk Series: Siqi Zhu |
Jan 25 Data and code management Pandas dataframes and good coding practices |
2 | Jan 30 Geospatial data in Python 1 Analyzing data with shapely Practitioner Talk Series: Chris Whong |
Feb 1 Geospatial data in Python 2 GeoPandas and GIS basics in Python |
3 | Feb 6 Data prep 1 Data wrangling, cleaning, and error management with pandas Practitioner Talk Series: Nathan Storey |
Feb 8 Data prep 2 Linking datasets, overlaying and aggregating data, re-classifying data with pandas |
4 | Feb 13 Data exploration 1 Mapping with geopandas, contextily, and Kepler Practitioner Talk Series: Shan He |
Feb 15 Data exploration 2 Descriptive statistics and visualization with matplotlib and seaborn HW 1 Due Feb 19 at 11:59pm |
5 | Feb 20 Data acquisition 1 Open data portals and APIs with Google maps, cenpy, and Socrata |
Feb 22 Data acquisition 1 Web-scraping with beautifulsoup |
6 | Feb 27 NO CLASS – FEBRUARY BREAK |
Mar 1 Spatial data analysis 1 Spatial weights with pysal |
7 | Mar 6 Spatial data analysis 2 Spatial autocorrelation with pysal Practitioner Talk Series: Mario Giampieri |
Mar 8 Spatial data analysis 3 Point pattern analysis HW 2 Due Mar 12 at 11:59pm |
8 | Mar 13 Unsupervised learning 1 Dimensionality reduction and K-means clustering with scikit-learn Practitioner Talk Series: Michelle Ho |
Mar 15 Unsupervised learning 2 Spatial clustering and regionalization |
9 | Mar 20 Regression 1 Linear regression with statsmodels and scikit-learn |
Mar 22 No lecture (Yuetong review day?) – AAG HW 3 Due Mar 26 at 11:59pm |
10 | Mar 27 Regression 2 Spatial regression with pysal |
Mar 29 Supervised learning 1 Classification with scikit-learn Project Proposal Due Mar 31 at 11:59pm |
11 | Apr 3 NO CLASS – SPRING BREAK |
Apr 5 NO CLASS – SPRING BREAK |
12 | Apr 10 Supervised learning 2 Ensemble learning with decision trees and random forest models with scikit-learn |
Apr 12 Supervised learning 3 Regression vs classification, model selection, bias-variance tradeoff, and cross-validation with scikit-learn HW 4 Due Apr 16 at 11:59pm |
13 | Apr 17 Special topics 1 OpenStreetMap with OSMnx |
Apr 19 Special topics 2 Spatial Inequality |
14 | Apr 24 In-class work and one-on-one final projects meetings with Wenfei |
Apr 26 No lecture (Yuetong review day?) – UAA |
15 | May 1 In-class work and one-on-one final projects meetings with Wenfei |
May 3 Final Project Presentations 1 |
16 | May 9 Final Project Presentations 2 |
Final project materials due May 17 at 11:59pm |