Verifying completeness after download_census()
sheffe opened this issue · 2 comments
I've just noticed that download_census()
can fail and require restart, generally after a 503 service unavailable
response. I can restart the download, which restarts from the beginning of a query, but that's inefficient at the scale totalcensus enables. (Half a terabyte and counting!)
Is there any way to verify the completeness of the downloaded records? I can't find anything directly in the package, and I'm not even sure that the file structure of the Census enables a programmatic check.
(A related issue: would you accept a PR adding a Sys.sleep() of user-elected time between calls to the downloads? As it stands, it's possible to hit the downloads quite hard at a time when interest in the 2017 release is likely to cause high baseline traffic.)
You are welcome to submit PRs to improve the package.
I am thinking of turn download_census()
into an internal function, as the downloading can be done automatically in read_xxxx()
functions. For example, if you want to download all 2017 ACS 5 year data (just added to the new version), simply run read_acs5year(2017, c(states_DC, "US", "PR")
. You will be asked to download data of states that are not in your computer. You can resume downloading those not downloaded if the internet is down with read_acs5year(2017, c(states_DC, "US", "PR")
. The download_census()
function in old version can do this kind of check but I am not sure how useful it is.
That's an interesting design change -- I think it could make the package easier to use for newcomers, with one hitch. I'm likely to be a weird user -- I use totalcensus
for pulling large batches of data (often takes many hours) and build it into data pipelines for separate projects. Waiting until it finds missing data for a download request makes it harder to use outside interactive sessions, which is why I was thinking about a pre-verification that all required files are present. Perhaps it would be possible to specify an argument for "Download any file I ask for automatically" (defaulting to FALSE) when converting download_census()
to internal?