Interpolate missing censuses
Opened this issue · 2 comments
I notice that small-town Kentucky has no estimates for the year 1820, but a large number of cities (eg Bardstown do have estimates for 1810 and 1830.
There are certainly a number of other cases like this (though I don't think any others that cover most of a state).
The core data series should represent missing values. My inclination is to do with the number '-1' so numeric computation is still easy. The overall estimates would be better if they interpolated missing years to the geometric mean of the outside years.
A related issue happens with the 1880 census. It reports a lot of unincorporated places (villages, etc) that were not reported in 1870 or 1890 (unless the towns were fairly big).
The question then is how to extrapolate these numbers. Extrapolating backwards is particularly tricky, as we don't want to extrapolate e.g negative population. An option would be to use the growth rate of the township where the town exists, but that of course carries its own set of assumptions.
Confirmed. In some cases this might be connected to the other issue I filed today; a foundation date might be better than nothing for back-extrapolation, possibly in concert with township or county data.
In the short term, I think interpolation is an easier call than extrapolation.
Towns vanishing in 1890:
Towns vanishing in 1900 are largely confined to TX, TN, and CA:
Code used to generate (for my own reference)
map.plotAPI({'year': 1900, "filters.Cities": "vanishes", 'yearOffset':10, 'scales.size.Cities': 'd => 4', 'drawing': ['Cities']})