dmi3kno/polite

Add re-try by default?

maelle opened this issue · 4 comments

When status isn't 200, would it make sense to add re-try by default? E.g. up to 5 times with exponentially increasing waiting time?

I do it already in scrape which becomes visible with verbose argument on

bow("https://www.stackoverflow.com") %>% nod("raw") %>% scrape(verbose = TRUE)
#> Attempt number 2.
#> Attempt number 3.This is the last attempt, if it fails will return NULL

You could argue that maybe the setting should be TRUE by default or that scrape should have the same verbosity as bow. I am not sold on any of these two.

One thing that looks like a glitch is that for websites that really have the 404 page placeholder, scrape currently does (warn_for_status), but then returns content of that 404 page instead of replacing it with NULL. That will be difficult to parse, so I should really do what I promise, which is discard content of the page, if status is non-200

Pushed a small patch that does this:

library(polite)
bow("https://www.stackoverflow.com") %>% nod("raw") %>% scrape(verbose = TRUE)
#> Attempt number 2.
#> Attempt number 3.This is the last attempt, if it fails will return NULL
#> NULL
#> Warning message:
#> Client error: (404) Not Found https://www.stackoverflow.com/raw 

Let me know if this is what you wanted, unless you wanted me to re-attempt downloading robottxt?

nice thanks