/broadband-mapping-dataset

Dataset from our IMC '20 paper.

Primary LanguagePython

Broadband Mapping Analysis

This repo contains the code for No WAN's Land, at IMC '20.

You can access the full dataset on Google Drive

Paper

Main Files

  • Dataset - data/data_{STATE_ABBR}.csv - Contains all residential addresses from the National Address Database with their FCC Form 477 coverage and BAT responses from each ISP.
  • Analysis Code - analysis.ipynb - Generates all tables and figures in the paper.
  • Columns - columns.csv - Column names for each raw SQL column by state.

Dataset Columns

Name Description Notes
addr_id Unique ID for each address.
addr_line1 Address number + street name.
addr_city Address city.
addr_state Address state.
addr_zip Address ZIP code.
addr_lat Address latitude.
addr_lon Address longitude.
addr_census_block Address census block.
addr_unit_type Address type (filtered to residential only).
addr_unit_id Address type ID.
addr_full Concatenated address: "line1 + city + state + zip".
fcc_coverage_{ISP}(_{TECH CODE}) FCC Form 477 coverage of the address by {ISP} with technology {TECH CODE} (if exists). 0 = Covered, 1 = Not Covered. Also, see FCC tech codes.
fcc_coverage_downspeed_{ISP}(_{TECH CODE}) Minimum download speed of the address' census block according to Form 477 by {ISP} with technology {TECH CODE} (if exists).
fcc_coverage_upspeed_{ISP}(_{TECH CODE}) Minimum upload speed of the address' census block according to Form 477 by {ISP} with technology {TECH CODE} (if exists).
tool_coverage_{ISP}(_{TECH CODE}) BAT coverage of the address by {ISP} with technology {TECH CODE} (if exists). See analysis.ipynb for the mapping from raw ISP API responses to our taxonomy of coverage outcomes.
tool_coverage_downspeed_{ISP}(_{TECH CODE}) Minimum download speed of the address according to {ISP}'s BAT with technology {TECH CODE} (if exists).
tool_coverage_upspeed_{ISP}(_{TECH CODE}) Minimum upload speed of the address according to {ISP}'s BAT with technology {TECH CODE} (if exists).
fcc_coverage_LOCAL FCC Form 477 coverage of the address by ANY local ISP (as defined in paper).
addr_dpv Delivery Point Validation. Whether the USPS recognizes an address as a valid delivery point. Queried using SmartyStreets.
addr_rdi Residential Delivery Indicator. Whether the USPS classifies an address as residential for billing purposes. Queried using SmartyStreets.

Required Files

  • FCC Stack Block Population Estimates - us2019.csv - Data, Info
  • Census Block Urban/Rural Data (Shapefiles) - block_class/{STATE}/tl_2019_{FIPS_CODE}_tabblock10.shp - Data/Info
  • ACS Demographic Data - NOTE: Use transposed, 5-year estimates.
    • Race - ACS/ACSDT5Y2018.B03002_data_with_overlays_2020-12-21T003055.csv - Data, Info
    • Poverty - ACS/ACSST5Y2018.S1701_data_with_overlays_2020-12-21T002937.csv - Data, Info

Optional Files

  • FCC Form 477 Data - fbd_us_without_satellite_jun2018_v1.csv - Data, Info
    • This was the latest data available at the time of the paper.

Known Issues

  • Border addresses - Some addresses on the border between two states are included even if the state is not in our dataset. (For example, there are 5 addresses from WV included in our VA dataset.) Since none of the addresses have Form 477 coverage in our dataset, they are effectively excluded from the analysis.
  • Duplicate addresses - The National Address Database contains duplicates of street names. For example, analysis.ipynb contains an example of an address that appears a whopping 62 times in our dataset! We did not filter out these cases from our dataset.