arilamstein/censusdis-streamlit

Odd jump in B06006_107E in 2006

Closed this issue · 1 comments

Variable B06006_107E is "number of people who worked from home". Very consistently, across most counties, there is a huge jump in this number from 2005 to 2006. (Note that 2005 is the first year of the survey). For example, in San Francisco County, California, I am seeing:

  • 2005: 2,557
  • 2006: 29,832 (an increase of 1,067% - seems spurious)
  • 2007: 28,262 (an increase of 5% - seems normal)

This is important because it is dwarfing the very real increase between 2019-2021, which is due to Covid, and is the core feature of the app.

I just posted a question about this in the Census slack about how I should handle this. One option is to simply remove 2005 from this dataset, but I'd prefer to get feedback from experts before doing that.

According to Joe G. from the Census Slack:

sometimes issues like that crop up because variable ids/codes are not 100% consistent from year to year.
In this case, there is no B06006 table that I can see — is B06006_107E a typo? Maybe you mean B08006_017E ? That is “total: worked from home” in 2022, but total Motorcycle commuters in 2005. — yep, in 2006 that one became what you’re looking for, but in 2005, B08006_021E was “Total :Worked at home”.
Census publishes “Table and Geography changes” for the ACS, in frustratingly un-structured format, but the oldest such page seems to be for 2007

(And yes, he is right about the typo as well). I just fixed the code.