Development of this project was pushed to a Django web-application: https://github.com/theonaun/theo_site
When my basement server is on, you can find it in action here: https://secure.theonaunheim.com
Please message me if you want login credentials.
Link to Windows installer at bottom.
Surgeo is an attempt to reverse engineer the Consumer Financial Protection Bureau's (CFPB) Bayesian Improved Surname Geocode Analysis (BISG). Python code by Theo Naunheim. Model created by Mark N. Elliot et al. For details, please see BACKGROUND.txt.
Please note that your shortcut to Python may be 'python' or 'python3' depending on how it is installed. Command line utility only as of August 2014 (run through cmd.exe). GUI later.
Version v0.7.0: 1) more closely mimics the CFPB model by only providing valid results where both name and zip are availible. 2) fixes a misapplication of iterative proportional fitting. 3) Still uses 2000 census data (see 'dev' branch for rewrite) 4) Python 3.4 no longer required for weighted arithmetic mean
# to install as program <Download installer at link below. Run. Don't forget to --setup!> # or to install as module pip3 install surgeo # or (less preferred option) python3 <path_to_setup.py> install
On Windows, open the 'cmd.exe' program and type the commands below.
--setup needs to be run before program will work. Requires internet access.
surgeo --setup
--file argument takes input and output (no return)
surgeo --file /path/input.csv /path/output.csv
--simple takes zip and surname (returns string)
surgeo --simple 63110 Jones 'White'
--complex takes zip and surname (returns detailed string)
surgeo --complex 63110 Jones "probable_race=White probable_race_percent=0.817650 surname=JONES zip=63110 hispanic=0.007056 white=0.817650 black=0.172591 asian=0.002249 indian=0.000077 multiracial=0.000377"
--pipe takes zip and surname arguments
cat | surgeo --pipe
Much like the above, but instead of 'surgeo' you will type 'python3 -m surgeo'
python3 -m surgeo --simple 63110 Jones 'White'
import surgeo # Download data and create tables (takes some time) surgeo.data_setup(verbose=True) # Create model object (SurModel and GeoModel also exist) model = surgeo.SurgeoModel() # Simple version returns 'White' model.guess_race(63110, 'Jones') # race_data() returns object surgeo_result = model.race_data(63110, 'Jones') # 'White' print(surgeo_result.probable_race) # '.0328' print(surgeo_result.black) # 'JONES' print(surgeo_result.surname) # Create new .csv with race data model.process_csv(csv_path, new_csv_path)
import surgeo from surgeo.experimental.weighted_mean import get_weighted_mean get_weighted_mean((percent_tuple), (examined_tuple), '/path/input.csv', '/path/output.csv') # Takes csv in the following format white, hispanic, examined_subject .05, .95, 1 .05, .95, 1 .05, .95, 1 .05, .95, 2 .05, .95, 4 .05, .95, 5 .05, .95, 4 .85, .15, 8 .85, .15, 12 .70, .30, 10 .55, .25, 8 .55, .25, 8 .75, .25, 10 .70, .30, 10 .01, .99, 8 .05, .95, 8 # With the following command (remember: all tuples need at least one comma) get_weighted_mean((0, 1), (2,), '/path/input.csv', '/path/output.csv') # And outputs text ########## examined_subject ########## sample mean: 6.25 sample standard deviation: 3.5619517121937516 white weighted mean: 9.082089552238807 white weighted stdv: 1.6618400534640232 hispanic weighted mean: 4.69921875 hispanic weighted stdv: 2.8194427490234375
Windows installer: https://dl.dropboxusercontent.com/u/26853373/surgeo-0.6.9-amd64.msi