This is a Docker container designed to calculate large-scale distance matrices for groups of Census tracts or blocks. It uses counties as a unit of work, taking a county FIPS code and the files generated by otp-resources as inputs and saving outputs to /resources/outputs/$GEOID/
inside the container (where $GEOID is the FIPS code of the county).
The container takes the following inputs as Docker environmental variables, all arguments are required:
- GEOID (the five-digit, 2010 FIPS code for a U.S. county)
- TRAVEL_MODE (the type of travel mode within OTP, can be 'CAR', 'TRANSIT,WALK', or 'WALK', also determines which .pbf file to use)
- TYPE (the type of matrix to create, can be 'TRACT' or 'BLOCK')
- OVERWRITE_GRAPH (boolean for whether or not to overwrite OTP-created Graph.obj, set to 'TRUE' when switch between travel modes)
- MAX_TRAVEL_TIME (the maximum travel time before cutoff, in seconds)
- MAX_WALK_DIST (the maximum walking distance before cutoff, in meters)
- CHUNKS (the number of chunks to divide the input file into, keep high for blocks and low for tracts)
- MAX_THREADS (the maximum number of threads to process jobs with)
Each OTP matrix calculation requires 4 input files: a PBF of the relevant area, a CSV of origin locations, a CSV of destination locations, and zip file of any GTFS feeds in the buffered county.
The container will look for these input files in /resources/graphs/$GEOID
. It will ingest the files exactly as they are outputted by otp-resources, simply mount the same /resources/graphs/
for both containers. An example of this setup is provided in submit_jobs_simple.sh
.
This container outputs a CSV distance matrix that is n * m long and 3 wide, where n is the number of origins and m is the number of destinations. Sample output:
origin,destination,minutes
17031010100,17031010201,19.37
17031010100,17031010202,10.35
17031010100,17031010300,10.77
17031010100,17031010400,13.33
17031010100,17031010501,11.03
17031010100,17031010502,13.95
17031010100,17031010503,16.35
17031010100,17031010600,18.35
Each CSV is bzip'd to save space (some block matrices can be very large). Output files are saved to /resources/outputs/$GEOID/
, and each output file is named according to its $GEOID, $TYPE, and $TRANSIT_MODE.
If you need to run this container but don't have the root privileges necessary to install Docker, try using udocker. There are only two changes required when using udocker:
- Alias
docker
to the udocker executable - Manually create the directories that you plan to store resources in (udocker seems to have trouble creating new directories on the host)
blacklist.csv
is a list of GTFS feeds that are improperly formatted and thus break OTP. You can delete them from every folder in the graphs directory by using find
, e.g.
xargs -a blacklist.csv -I filename find ~/resources/graphs/ -name filename -delete
You can use the example extract_from_bzip.sh
script to extract a smaller segment of a larger file. The script is a quick one-liner that uses awk to match the GEOIDs specified in a filename of your choice. For example, the following code would extract the tracts specified in list_of_geoids.csv
from the file 36061-output-TRACT-CAR.csv.bz2
.
bzcat 36061-output-TRACT-CAR.csv.bz2 awk -F',' "$(for x in $(cat list_of_geoids.csv); do printf "\$1 == \"$x\" || "; done | rev | cut -c 4- | rev)"