MrCull/GitHub-Repo-ReadMe-Dead-Link-Finder

Forbidden (403) when the link is actually reachable

tomicapretto opened this issue · 3 comments

Hi, thanks for this great tool. I'm using it to check links in Bambi repo. The checker shows a Forbidden (403) warning for an arXiv link, but it is actually reachable. I'm not sure if this is a problem with the tool or just something weird about arXiv.

image

Hi @tomicapretto thank you for the positive feedback regarding this tool :)

Also thank you for taking the time to report this issue.
I think this has something to do with the difference between how the C# .NET HTTPClient makes a http request compared to how a Browser makes the same request.
e.g. see:
https://stackoverflow.com/questions/20581117/parsing-from-website-which-return-403-forbidden/20585344#20585344

In theory it could be possible to solve this on a case by case approach by adding required headers known for each site. However that would likely not be a practical scalable solution.
Alternatively there would hopefully be a more generic solution which would be great. However I am extremely busy at the movement so will unlikely find time to research this myself right now. But I welcome PRs from anyone else.

Failing finding a solution for this there would be the decision to either keep 403s as warning, or potentially ignore them. But ignoring them may hide actual real issues sometimes.

So the summary is unfortunately this feature may unlikely change in this tool in the short term future.

Hi @MrCull, thanks for the response! I'm happy with how the tool works right now. At least I don't need to check links by hand one by one, and when I see a warning as above, I just check it manually. So, feel free to close this issue if you want, or leave it open for the record. I'm going to continue using this tool because it's great!

Now fixed @tomicapretto. Thank you for reporting this.

image

Resolve using code from: https://stackoverflow.com/questions/21441688/im-getting-403-with-httpclient-on-portable-class-library
"Try calling it like a browser"