github/docs

Blocking some user agents?

Closed this issue · 8 comments

Code of Conduct

What article on docs.github.com is affected?

I know for sure:

I suspect others are impacted. For the pages I've tested, both Sphinx's link checker and even cURL are getting 403's by default when trying to access the page:

curl --head https://docs.github.com/en/actions

gives

HTTP/2 403 
x-azure-ref: 0GhdWYgAAAADzwxBhE8RiQLHcCy8WV92AU0pDRURHRTAzMTcANTk2ZDc4YTItY2E1Zi00NzlkLWJjZGMtMDgzNTgzMzE3NGIy
accept-ranges: bytes
date: Wed, 13 Apr 2022 00:19:38 GMT
via: 1.1 varnish
x-served-by: cache-den8270-DEN
x-cache: MISS
x-cache-hits: 0
x-timer: S1649809178.110803,VS0,VE117
strict-transport-security: max-age=31557600

The only way I can get it to work is by giving it a full realistic looking user agent (a partial one doesn't even do it):

curl -A "Mozilla/5.0 (X11; Ubuntu; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0" --head https://docs.github.com/en/actions

which gives the expected:

HTTP/2 200 
cache-control: private, no-store
content-type: text/html; charset=utf-8
etag: "45865-2BLz46zqCXLjEdY4gPLZxAVqnWY"
set-cookie: _csrf=ZIvPK-0aEwEb2w8LVBDtYmF9; Path=/; HttpOnly; Secure; SameSite=Lax
access-control-allow-origin: *
content-security-policy: default-src 'none';prefetch-src 'self';connect-src 'self';font-src 'self' data: githubdocs.azureedge.net;img-src 'self' data: github.githubassets.com githubdocs.azureedge.net placehold.it *.githubusercontent.com github.com;object-src 'self';script-src 'self';frame-src https://graphql.github.com/ https://www.youtube-nocookie.com;style-src 'self' 'unsafe-inline';child-src 'self'
x-dns-prefetch-control: off
expect-ct: max-age=0
x-frame-options: SAMEORIGIN
x-download-options: noopen
x-content-type-options: nosniff
x-permitted-cross-domain-policies: none
referrer-policy: strict-origin-when-cross-origin
x-xss-protection: 0
x-powered-by: Next.js
x-azure-ref: 0xBdWYgAAAABsIRkj3IKZTZjiq20jFYjRU0pDRURHRTA1MDYANTk2ZDc4YTItY2E1Zi00NzlkLWJjZGMtMDgzNTgzMzE3NGIy
accept-ranges: bytes
date: Wed, 13 Apr 2022 00:22:28 GMT
via: 1.1 varnish
x-served-by: cache-den8225-DEN
x-cache: CONFIG_NOCACHE, MISS
x-cache-hits: 0
x-timer: S1649809349.623786,VS0,VE312
vary: Accept-Encoding
strict-transport-security: max-age=31557600
content-length: 284773

What changes are you suggesting?

I'm not sure if this is intentional or not, but it's keeping me from validating links in my documentation when running even on GitHub Actions.

Additional information

No response

Thanks for opening this issue. A GitHub docs team member should be by to give feedback soon. In the meantime, please check out the contributing guidelines.

Thanks for opening an issue! We've triaged this issue for technical review by a subject matter expert 👀

dmke commented

Can confirm, this looks like an incomplete(1) regular expression:

UA string works(2)
Mozilla/0 Gecko/00000000 Firefox/0 yes
Mozilla/5.0 (compatible; rv:100.0) Gecko/20380120 Firefox/100.0 no

Please note that the same request blocking also makes the contact form (https://support.github.com/contact/bug-report) unusable:

image

image


(1) Can any such list be ever complete?

(2) curl --fail -IA $UA_STRING https://docs.github.com

I'm also not sure what the point of blocking based on user-agents even would be, given that it's a client-supplied header that's readily overridden.

WILLIAM JOSEPH BAUGHMAN JR

Thanks for flagging this problem. I'm going to ask our engineering team to take a look.

@dopplershift Thanks so much for opening an issue to let us know what you're seeing and I'm sorry you're seeing issues with the docs site! I've opened an internal issue for the team to look at so I'm going to close this now 💛