go-shiori/go-readability

Content-type headers: URL is not an HTML document

Closed this issue · 2 comments

https://gizmodo.com/elon-musk-asks-twitter-for-more-bot-data-twitter-hands-1849104921

Looks like their response is sending back content-type instead of Content-Type. On this line we are doing a case sensitive header lookup

https://github.com/go-shiori/go-readability/blob/master/readability.go#L54

Could we do a case insensitive header lookup so URLs could still work here?

HTTP/2.0 200 OK
x-powered-by: Express
x-kinja: kinja-magma-kube01-699ff988c8-89zdj #3259
x-kinja-revision: 6bd8262e0882921eecc1189e40136b2c05b4e455
x-kinja-server: kinja-magma-kube01-699ff988c8-89zdj
x-kinja-build: 3259
cache-control: stale-if-error=86400, stale-while-revalidate=300
content-security-policy: frame-ancestors 'self'; upgrade-insecure-requests
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
strict-transport-security: max-age=63072000; includeSubDomains; preload
x-googlenews-bot: false
x-frame-options: deny
content-type: text/html; charset=utf-8
etag: W/"41dc9-CrkFT135xFzEtSt2In9AhzlQrzs"
x-exp-variant: NotInTest
x-cdn-fetch: mantle-default
accept-ranges: bytes
date: Sat, 25 Jun 2022 04:28:47 GMT
via: 1.1 varnish
age: 133
x-served-by: cache-iad-kjyo7100028-IAD
x-cache: HIT
x-cache-hits: 1
x-timer: S1656131328.721752,VS0,VE2
vary: Accept-Encoding, X-Feature-Hash, X-Forwarded-Proto, X-Valid-Scroll-User, X-GoogleNews-Bot, X-Kinja-LoggedIn, X-Kinja-WelcomeAdLoadedV1, X-Kinja-Req-Origin-US, X-Kinja-SuperHeroLoaded, X-Kinja-GDPR, X-Kinja-CCPA, Authorization
x-ua-device: desktop
set-cookie: geocc=US;path=/;
set-cookie: KinjaBucket=4;path=/;Max-Age=31536000;domain=gizmodo.com;SameSite=None;Secure;
set-cookie: KinjaSetBucket=4|1656131100|ex9okipZpXJx/XEfLSLZnLyRj44mytc4wyhtdSn7a7s=;path=/;Max-Age=300;SameSite=None;Secure;
x-exp-id: NotInTest

Nevermind, closing this out as it seems to be fixed in master

Re-opening as the URL depending on webserver seems to return valid header or not valid header.