ncss-tech/SoilKnowledgeBase

refresh-extdata action failing during NSSH parsing

brownag opened this issue · 1 comments

SKB weekly update in GH actions is failing to completely download one of the NSSH sections.
https://github.com/ncss-tech/SoilKnowledgeBase/runs/3401794971?check_suite_focus=true

Will re-try the scheduled action in a bit and hope that it resolves itself (as it usually does) b/c the link (https://directives.sc.egov.usda.gov/41514.wba) works fine.

trying URL 'https://directives.sc.egov.usda.gov/41514.wba'
Content type 'application/pdf; charset=utf-8' length 143062 bytes (139 KB)
==========================================
downloaded 117 KB

In addition: Warning messages:
1: In download.file(y$href, destfile = pat) :
  downloaded length 120224 != reported length 143062
2: In download.file(y$href, destfile = pat) :
  URL 'https://directives.sc.egov.usda.gov/OpenNonWebContent.aspx?content=41514.wba': status was 'Failure when receiving data from the peer'
Quitting from lines 44-64 (README.Rmd) 

Ideas:
In near future I would like to be able to cache these datasets and decide whether a download is necessary by comparing hashes or something. However, as there aren't hashes published on eDirectives, not sure I can get around downloading at least some of the files.

Perhaps I will create a separate routine that caches and hashes infrequently updated things like taxonomy, the directives downloads, etc. and then they are only used/updated if different using something like targets to manage the dependencies of data -> product (#6)

Closing this issue as the intermittent bad download.file() behavior did not occur when I ran the latest and the link now points to a passing job, just good motivation for innovations #6