writeup v0 # Census Data This project is an exploration of how to parse and visualize US Census American Community Survey (ACS) data, in particular the 2015 5-year dataset. The "Summary File" is the offical term for the detailed dataset: > The ACS Summary File is a set of comma-delimited text files that contain all of the Detailed Tables. By comma-delimited text files, I mean that the file contains estimates (or margins of error) separated by commas. I will show you what I mean on the next slide. By Detailed Tables, I mean the pre-tabulated tables that start with B(base) or C (collapsed). The ACS Summary File is stored in a series of files on the file transfer protocol (or FTP) site. The files contain only the estimates or margins of error from the tables. It does not include information suchastabletitle,descriptionoftherows,etcthatyouareusedtoseeinginAmericanFactFinder. The file becomes more useful as you add in the identifying information. ^1 I am particularly interested in the census block group data, which is the smallest geographical division available: > It is important to note the difference between Legal and Administrative areas and Statistical areas. First, Legal/administrative areas have legally described boundaries; they may provide governmental services or may be used to administer programs. (Examples are Counties, Incorporated Places, Congressional Districts, and School Districts) > Statistical geographic areas are defined primarily for data tabulation and presentation purposes. (Examples are Public Use Microdata Areas, Census Tracts, and Block Groups). > Census tracts are small, relatively permanent of a county or county equivalent. Census tracts generally have a minimum population of 1,200, or 480 housing units, and a maximum population of 8,000 people or 3,200 housing units. Tracts have an optimum size of 4,000 people or 1,600 housing units. > Block groups are statistical divisions of census tracts and are defined to contain a minimum of 600 persons or 240 housing units and a maximum of 3,000 people or 1,200 housing units. In the American Community Survey, block groups are the lowest level of geography published. ^1 The ACS comes in 1 and 5 year datasets; the 5 year is an aggregate of five years of collected data, e.g. 2010-2015, and gives better "sample size/reliability/precision" for small populations at the expense of currency. The 3-year product was discontinued. ^2 # Sources The primary data sources are found in this folder: https://www2.census.gov/programs-surveys/acs/summary_file/2015/data/5_year_entire_sf/ Additionally, we use the National Census Tracts Gazetteer to get geocoordinates for each tract. Alternatively, we could use the TIGER/Line geodata, but that would be much more complicated. The limitation of the Gazetteer data is that it does not cover the block groups, only tracts and larger areas. See <https://www.census.gov/geo/maps-data/data/gazetteer2015.html>. In order to build this project, the following need to be downloaded into a subdirectory named `data`: * <https://www2.census.gov/programs-surveys/acs/summary_file/2015/data/2015_5yr_Summary_FileTemplates.zip> (1.3MB) * <https://www2.census.gov/programs-surveys/acs/summary_file/2015/data/5_year_entire_sf/2015_ACS_Geography_Files.zip> (21MB) * <https://www2.census.gov/programs-surveys/acs/summary_file/2015/data/5_year_entire_sf/Tracts_Block_Groups_Only.tar.gz> (3.5GB) * <http://www2.census.gov/geo/docs/maps-data/data/gazetteer/2015_Gazetteer/2015_Gaz_tracts_national.zip> (1.8MB) Additionally, following documentation is helpful: * <https://www.census.gov/programs-surveys/acs/technical-documentation/summary-file-documentation.html> * <https://www2.census.gov/programs-surveys/acs/summary_file/2015/documentation/tech_docs/ACS_2015_SF_5YR_Appendices.xls> * <https://www2.census.gov/programs-surveys/acs/summary_file/2015/documentation/tech_docs/2015_SummaryFile_Tech_Doc.pdf> # Details ## ACS Documentation The data files are named as follows: "20155ak0001000.csv" * "2015": Reference Year: ACS data year (last year of the period for multiyear periods). * "5": Period (1 or 5): Period Covered. * "ak": State Level: US or abbreviations for state, District of Columbia, and Puerto Rico * "0001": Sequence Number: 0001 to 9999. * "000": IterationID: Iteration ID for Selected Population Tables and American Indian & Alaska Native Tables. Note: Iteration ID is always “000” for the standard 1-Year and 5-Year products. from ACS_2015_SF_5YR_Appendices.xls: | table: B01003; title: Total Population; sequence: 0003; start pos: 130; end pos: 130-130. This is the last table in seqence 3, so sequence 3 has 130 columns, the last of which is the total population. ## Gazetteer The Gazetteer data has the following columns: | USPS United States Postal Service State Abbreviation | GEOID Geographic Identifier - fully concatenated geographic code (State FIPS, County FIPS, census tract number) | ALAND Land Area (square meters) - Created for statistical purposes only | AWATER Water Area (square meters) - Created for statistical purposes only | ALAND_SQMI Land Area (square miles) - Created for statistical purposes only | AWATER_SQMI Water Area (square miles) - Created for statistical purposes only | INTPTLAT Latitude (decimal degrees) First character is blank or "-" denoting North or South latitude respectively | INTPTLONG Longitude (decimal degrees) First character is blank or "-" denoting East or West longitude respectively # TODO ## For the healthcare portion of the project: Pharmacies and Hospitals: Pharmacies: https://hifld-dhs-gii.opendata.arcgis.com/datasets/19145a0e403a4af4b2e4b76a6f2ec0ee_0 Hospitals: https://hifld-dhs-gii.opendata.arcgis.com/datasets/e13641c764344b8ab7dfd41831e56940_0 ## Geographic distance calculations http://www.movable-type.co.uk/scripts/latlong.html The haversine formula calculates the great-circle distance between two points – that is, the shortest distance over the Earth’s surface. from math import radians, sin, cos, sqrt, asin | def haversine(lat1, lon1, lat2, lon2): | R = 6372.8 # Earth radius in kilometers. | dLat = radians(lat2 - lat1) | dLon = radians(lon2 - lon1) | lat1 = radians(lat1) | lat2 = radians(lat2) | a = sin(dLat/2)**2 + cos(lat1) * cos(lat2) * sin(dLon/2)**2 | c = 2 * asin(sqrt(a)) | return R * c ## Geodata census tract and block group data can be downloaded in "geodatabase format", but I don't know how to parse the files. https://www.census.gov/geo/maps-data/data/tiger-data.html ## SQLite bulk import. http://www.sqlite.org/cli.html#csv_import ## Miscellaneous https://censusreporter.org/ can help identify various column names. https://www.census.gov/programs-surveys/acs/guidance/training-presentations/acs-block-groups.html https://www.census.gov/programs-surveys/acs/technical-documentation/summary-file-documentation.html https://en.wikipedia.org/wiki/Census_block_group https://en.wikipedia.org/wiki/Equirectangular_projection Joining with Geodata: https://www2.census.gov/programs-surveys/acs/summary_file/2015/documentation/tech_docs/ACS_SF_TIGERLine_Shapefiles.pdf https://www2.census.gov/geo/pdfs/maps-data/data/tiger/tgrshp2015/TGRSHP2015_TechDoc.pdf Documentation downloads https://www2.census.gov/programs-surveys/acs/summary_file/2015/documentation/geography/5yr_year_geo/ https://www2.census.gov/programs-surveys/acs/summary_file/2015/documentation/geography/5yr_year_geo/ak.xls ^1: <https://www.census.gov/content/dam/Census/programs-surveys/acs/guidance/training-presentations/2016_BlockGroups_Transcript_01.pdf> ^2: <https://www.census.gov/programs-surveys/acs/guidance/estimates.html>