Dealing with redirects?
tmcw opened this issue · 0 comments
tmcw commented
This has avoided dealing with redirects so far, other than SSL upgrades. I figure we should eventually look into it.
Basically, the issue with redirects is that a lot of the ones you get are fake or bad - a news site redirecting to a paywall, some bot detector redirecting to a captcha, or sites that implement "soft 404s" and instead redirect to a 404 page (that sometimes also returns a 200!)
I'm not sure about best practices here, and might look into the Wikipedia bots for an answer.