Add re-try by default?
maelle opened this issue · 4 comments
When status isn't 200, would it make sense to add re-try by default? E.g. up to 5 times with exponentially increasing waiting time?
I do it already in scrape
which becomes visible with verbose
argument on
bow("https://www.stackoverflow.com") %>% nod("raw") %>% scrape(verbose = TRUE)
#> Attempt number 2.
#> Attempt number 3.This is the last attempt, if it fails will return NULL
You could argue that maybe the setting should be TRUE by default or that scrape
should have the same verbosity as bow
. I am not sold on any of these two.
One thing that looks like a glitch is that for websites that really have the 404 page placeholder, scrape
currently does (warn_for_status), but then returns content of that 404 page instead of replacing it with NULL. That will be difficult to parse, so I should really do what I promise, which is discard content of the page, if status is non-200
Pushed a small patch that does this:
library(polite)
bow("https://www.stackoverflow.com") %>% nod("raw") %>% scrape(verbose = TRUE)
#> Attempt number 2.
#> Attempt number 3.This is the last attempt, if it fails will return NULL
#> NULL
#> Warning message:
#> Client error: (404) Not Found https://www.stackoverflow.com/raw
Let me know if this is what you wanted, unless you wanted me to re-attempt downloading robottxt?
nice thanks