datadesk/census-data-downloader

Processed data contains duplicate data for multiple geographies

aboutaaron opened this issue · 0 comments

Bug/Issue

Census data downloader correctly downloads raw data but creates a CSV duplicated data in the processed directory.

Environment

  • Python 3.8
  • Pipenv version 2018.11.27.dev0
  • Latest version of censusdatadownloader

Reproduce

Install the package and then try to download a data set.

pipenv install census-data-downloader
censusdatadownloader --data-dir data/census race states

Expected behavior

A 52 row CSV file with total population by race in the processed directory.

Actual behavior

A 52 CSV with the same data for each column processed directory.

Possible issues/solutions

It looks like the data is correctly downloaded in the raw directory which makes me think something's happening in the process step. I'm seeing this behavior specifically with the race [geography] arguments.

I noticed the same behavior for internet counties but did get the correct data when I used internet states.

I'll see if I can debug what's happening at the process step but in the meantime I'll rely on the raw data. Thanks for your work on this!