Use csv2sqlite.py
to save the raw csv.gz data file into an sqlite database.
acs
: table from ACS microdatapython csv2sqlite.py --gzip acs_08-16.csv.gz acs_08-16.db acs
mig2met
: table to convert migration state/puma to puma- run
data-prep.r
to build the csv from the two csv files mapping puma and msa - then load it into the sqlite database as another table
python csv2sqlite.py mig2met.csv acs_08-16.db mig2met
Then use SQL queries to get aggregated values to avoid loading the entire dataset into memory. Queries apply categorizations (race, edu) on-the-fly, so no need to pre-clean the data.
- run
- set up data: check these files for correct filenames per model (different specifications by age and type)
smooth-pops.r
: query and smooth aggregated population counts in each desired metro- Total/single/married populations, marriage/divorce flows, migration flows
- Smoothing by non-parametric regression (local-polynomial): using hand-rolled "diagonal" smoothing kernel (manual bandwidth)
- Saves smoothed data to csv for loading into
julia
mort-rates.r
: interpolates and saves death ratesmain-estim.jl
: runs the show, but need to set options first- loads populations from saved JLD files, or calls
prepare-pops.jl
to generate them anew prepare-pops.jl
: loads csv files generated byR
scripts above, then converts DataFrames to multidimensional arrays (per metro) and saves as JLD files- estimate arrival rates and then non-parametric objects using
estim-functions.jl
andcompute-npobj.jl
- can also do a parameter grid search or monte carlo estimation
- loads populations from saved JLD files, or calls
plot-results.r
: plot model-data fit and estimated objectstikz-conversion.R
: produce tikz figures from saved plot objects
Run scripts in order to set up resampled datasets, run smoothing, and then estimation. Uses GNU Parallel for efficient batch processing.
Rscript bootstrap-resampler.r
: creates directoriesdata/bootstrap-samples/resamp_00
with resampled csv databash bootstrap-create-db.sh
: creates sqlite db from csv filesbash bootstrap-smooth.sh
: runssmooth-data.r
for both ageonly and racedu specifications- Took 40 hours for 100 resamples on 8 cores, low memory usage (<2GB)
bash bootstrap-cp-psi.sh
: copies the death rate data into the smoothed populations directories for each resamplebash bootstrap-estim.sh
: runsmain-estim.jl
for both ageonly and racedu specifications- Took 100 minutes for 100 resamples on 8 cores, low memory usage (<4GB)
- 35620: 14.5m - New York-Newark-Jersey City, NY-NJ-PA
- 31080: 9.4m - Los Angeles-Long Beach-Anaheim, CA
- 16980: 6.8m - Chicago-Naperville-Elgin, IL-IN-WI
- 19100: 4.6m - Dallas-Fort Worth-Arlington, TX
- 37980: 4.4m - Philadelphia-Camden-Wilmington, PA-NJ-DE-MD
- 26420: 4.2m - Houston-The Woodlands-Sugar Land, TX
- 47900: 4.1m - Washington-Arlington-Alexandria, DC-VA-MD-WV
- 33100: 4.1m - Miami-Fort Lauderdale-West Palm Beach, FL
- 12060: 3.8m - Atlanta-Sandy Springs-Roswell, GA
- 14460: 3.5m - Boston-Cambridge-Newton, MA-NH
- 41860: 3.3m - San Francisco-Oakland-Hayward, CA
- 19820: 3.1m - Detroit-Warren-Dearborn, MI
- 38060: 3.1m - Phoenix-Mesa-Scottsdale, AZ
- 40140: 3.0m - Riverside-San Bernardino-Ontario, CA
- 42660: 2.6m - Seattle-Tacoma-Bellevue, WA
- 33460: 2.4m - Minneapolis-St. Paul-Bloomington, MN-WI
- 41740: 2.3m - San Diego-Carlsbad, CA
- 45300: 2.1m - Tampa-St. Petersburg-Clearwater, FL
- 41180: 2.0m - St. Louis, MO-IL
- 12580: 2.0m - Baltimore-Columbia-Towson, MD
30th metro is 1.4m, 40th is 0.9m.
estimate-rates-full.r
andestimate-rates.r
- very poor accuracy due to noisy inference on divorce flows
- Need marriage and divorce rates for each couple-type (globally)
- Marriage rate (directly observable): SQL queries for flows and stocks to compute rates
- Divorce rate (infer from non-divorce rate and death rate)
- Weighted OLS (by stocks of couples)