Processed data contains duplicate data for multiple geographies
aboutaaron opened this issue · 0 comments
Bug/Issue
Census data downloader correctly downloads raw
data but creates a CSV duplicated data in the processed
directory.
Environment
- Python 3.8
- Pipenv version 2018.11.27.dev0
- Latest version of censusdatadownloader
Reproduce
Install the package and then try to download a data set.
pipenv install census-data-downloader
censusdatadownloader --data-dir data/census race states
Expected behavior
A 52 row CSV file with total population by race in the processed
directory.
Actual behavior
A 52 CSV with the same data for each column processed
directory.
Possible issues/solutions
It looks like the data is correctly downloaded in the raw
directory which makes me think something's happening in the process step. I'm seeing this behavior specifically with the race [geography]
arguments.
I noticed the same behavior for internet counties
but did get the correct data when I used internet states
.
I'll see if I can debug what's happening at the process step but in the meantime I'll rely on the raw data. Thanks for your work on this!