Assignment 02: PostGIS Analytics

Due: Oct 18, 2021 by 11:59pm ET

Submission Instructions

  1. Fork this repository to your GitHub account.

  2. Write a query to answer each of the questions below. Your queries should produce results in the format specified. Write your query in a SQL file corresponding to the question number (e.g. a file named query06.sql for the answer to question #6). Each SQL file should contain a single SELECT query (though it may include other queries before the select if you need to do things like create indexes or update columns). Some questions include a request for you to discuss your methods. Update this README file with your answers in the appropriate place.

  3. There are several datasets that are prescribed for you to use in this assignment. Your datasets should be named:

  1. Submit a pull request with your answers. You can continue to push changes to your repository up until the due date, and those changes will be visible in your pull request.

Note, I take logic for solving problems into account when grading. When in doubt, write your thinking for solving the problem even if you aren't able to get a full response.

Questions

  1. Which bus stop has the largest population within 800 meters? As a rough estimation, consider any block group that intersects the buffer as being part of the 800 meter buffer.

Answer:01

  1. Which bus stop has the smallest population within 800 meters?

The queries to #1 & #2 should generate relations with a single row, with the following structure:

(
    stop_name text, -- The name of the station
    estimated_pop_800m integer, -- The population within 800 meters
    the_geom geometry(Point, 4326) -- The geometry of the bus stop
)

Answer:02

  1. Using the Philadelphia Water Department Stormwater Billing Parcels dataset, pair each parcel with its closest bus stop. The final result should give the parcel address, bus stop name, and distance apart in meters. Order by distance (largest on top).

Structure:

(
    address text,  -- The address of the parcel
    stop_name text,  -- The name of the bus stop
    distance_m double precision  -- The distance apart in meters
)

Answer:03

  1. Using the shapes.txt file from GTFS bus feed, find the two routes with the longest trips. In the final query, give the trip_headsign that corresponds to the shape_id of this route and the length of the trip.

Structure:

(
    trip_headsign text,  -- Headsign of the trip
    trip_length double precision  -- Length of the trip in meters
)

Answer:04

  1. Rate neighborhoods by their bus stop accessibility for wheelchairs. Use Azavea's neighborhood dataset from OpenDataPhilly along with an appropriate dataset from the Septa GTFS bus feed. Use the GTFS documentation for help. Use some creativity in the metric you devise in rating neighborhoods. Describe your accessibility metric:

Description: Answer: From my opinion, I create a variable calculating the density of wheelchair boarding in each neighborhood. There are three steps. step1. group the stops by the neighborhood block,select all the bus stops contain in the neighborhood.. step2. divide the bus stop into two type, the one with zero wheelchair boarding is classified inaccessible, while the one with more than one wheelchair boarding, in this case one or two wheelchair bording is classified accessible. step3. divided the sum of wheelchair boarding by the area of each neighborhood, I can get metric as the density of wheelchair boarding. step4. the higher the metric is, the better accessibility the neighborhood is.

  1. What are the top five neighborhoods according to your accessibility metric? Answer:06

  2. What are the bottom five neighborhoods according to your accessibility metric? Answer:07

Both #6 and #7 should have the structure:

(
  neighborhood_name text,  -- The name of the neighborhood
  accessibility_metric ...,  -- Your accessibility metric value
  num_bus_stops_accessible integer,
  num_bus_stops_inaccessible integer
)
  1. With a query, find out how many census block groups Penn's main campus fully contains. Discuss which dataset you chose for defining Penn's campus.

Structure (should be a single value):

(
    count_block_groups integer
)

Answer: 58

  1. With a query involving PWD parcels and census block groups, find the geo_id of the block group that contains Meyerson Hall. ST_MakePoint() and functions like that are not allowed.

Structure (should be a single value):

(
    geo_id text
)

Answer:421010369001

  1. You're tasked with giving more contextual information to rail stops to fill the stop_desc field in a GTFS feed. Using any of the data sets above, PostGIS functions (e.g., ST_Distance, ST_Azimuth, etc.), and PostgreSQL string functions, build a description (alias as stop_desc) for each stop. Feel free to supplement with other datasets (must provide link to data used so it's reproducible), and other methods of describing the relationships. PostgreSQL's CASE statements may be helpful for some operations.

Structure:

(
    stop_id integer,
    stop_name text,
    stop_desc text,
    stop_lon double precision,
    stop_lat double precision
)

Answer:10

As an example, your stop_desc for a station stop may be something like "37 meters NE of 1234 Market St" (that's only an example, feel free to be creative, silly, descriptive, etc.)

Tip when experimenting: Use subqueries to limit your query to just a few rows to keep query times faster. Once your query is giving you answers you want, scale it up. E.g., instead of FROM tablename, use FROM (SELECT * FROM tablename limit 10) as t.