Estimation

Prepare ACS data

Use csv2sqlite.py to save the raw csv.gz data file into an sqlite database.

acs: table from ACS microdata
- python csv2sqlite.py --gzip acs_08-16.csv.gz acs_08-16.db acs
mig2met: table to convert migration state/puma to puma
- run data-prep.r to build the csv from the two csv files mapping puma and msa
- then load it into the sqlite database as another table
- python csv2sqlite.py mig2met.csv acs_08-16.db mig2met Then use SQL queries to get aggregated values to avoid loading the entire dataset into memory. Queries apply categorizations (race, edu) on-the-fly, so no need to pre-clean the data.

set up data: check these files for correct filenames per model (different specifications by age and type)
smooth-pops.r: query and smooth aggregated population counts in each desired metro
- Total/single/married populations, marriage/divorce flows, migration flows
- Smoothing by non-parametric regression (local-polynomial): using hand-rolled "diagonal" smoothing kernel (manual bandwidth)
- Saves smoothed data to csv for loading into julia
mort-rates.r: interpolates and saves death rates
main-estim.jl: runs the show, but need to set options first
- loads populations from saved JLD files, or calls prepare-pops.jl to generate them anew
- prepare-pops.jl: loads csv files generated by R scripts above, then converts DataFrames to multidimensional arrays (per metro) and saves as JLD files
- estimate arrival rates and then non-parametric objects using estim-functions.jl and compute-npobj.jl
- can also do a parameter grid search or monte carlo estimation
plot-results.r: plot model-data fit and estimated objects
- tikz-conversion.R: produce tikz figures from saved plot objects

Run scripts in order to set up resampled datasets, run smoothing, and then estimation. Uses GNU Parallel for efficient batch processing.

Rscript bootstrap-resampler.r: creates directories data/bootstrap-samples/resamp_00 with resampled csv data
bash bootstrap-create-db.sh: creates sqlite db from csv files
bash bootstrap-smooth.sh: runs smooth-data.r for both ageonly and racedu specifications
- Took 40 hours for 100 resamples on 8 cores, low memory usage (<2GB)
bash bootstrap-cp-psi.sh: copies the death rate data into the smoothed populations directories for each resample
bash bootstrap-estim.sh: runs main-estim.jl for both ageonly and racedu specifications
- Took 100 minutes for 100 resamples on 8 cores, low memory usage (<4GB)

30th metro is 1.4m, 40th is 0.9m.

estimate-rates-full.r and estimate-rates.r
- very poor accuracy due to noisy inference on divorce flows
Need marriage and divorce rates for each couple-type (globally)
- Marriage rate (directly observable): SQL queries for flows and stocks to compute rates
- Divorce rate (infer from non-divorce rate and death rate)
Weighted OLS (by stocks of couples)