tmcw/notfoundbot

Dealing with redirects?

tmcw opened this issue · 0 comments

tmcw commented

This has avoided dealing with redirects so far, other than SSL upgrades. I figure we should eventually look into it.

Basically, the issue with redirects is that a lot of the ones you get are fake or bad - a news site redirecting to a paywall, some bot detector redirecting to a captcha, or sites that implement "soft 404s" and instead redirect to a 404 page (that sometimes also returns a 200!)

I'm not sure about best practices here, and might look into the Wikipedia bots for an answer.