denisdefreyne/adsf

Content-Type response header missing charset=utf-8

da2x opened this issue · 2 comments

da2x commented

ads returns a Content-Type response header, but its missing the charset=utf-8 component. HTML documents with a <meta charset="utf-8"> declaration works anyway, but other text resources like text/plain doesn’t have a reliable fallback mechanism like this (the BOM character causes more problems than it solves). text/css has @charset="utf-8"; but when did you last see that in use?

adsf has no way of inferring the encoding of textual files. I think defaulting to UTF-8 could create more problems than it’s worth.

There is rchardet, but it does not seem to be maintained, and my personal feeling is that I’d rather rely on the browser to guess the encoding.

@da2x What is your use case for having the charset provided by adsf in the header?

da2x commented

adsf has no way of inferring the encoding of textual files. I think defaulting to UTF-8 could create more problems than it’s worth.

Possibly. It’s a really good default, though.

You can express all of Unicode with UTF-8 and it’s backwards-compatible with US-ASCII and ISO-8859. That covers at least 96 % of the web, according to BuiltWith Trends and W3Techs. The Unicode Consortium (which sounds totally evil) sorta won the standards war.

The only real contestant to UTF-8 is GB 2312 (which has a few percentages of the Chinese market), which is initially compatible with US-ASCII but their character tables diverges after that.

@da2x What is your use case for having the charset provided by adsf in the header?

Well, I’m previewing a project in my web browser that will later be deployed on a webserver configured to work serve Unicode content. Isn’t that what adsf is for? Some examples:

awesome.css:

.popular::before {
  content: "💛";
}

robots.txt

# Welcome robots! 🤖
User-agent: *
Disallow: /مرحبا بالعالم/