cesanta/fossa

Support `custom_mime_types` option when serving static files

tiesjan opened this issue · 10 comments

Hi there!

First of all, I want to say: thanks for all the good work of this library. It's been an enormous help so far with the WebSockets, built-in JSON-RPC support and static file serving via HTTP. 👍

Personally I'm a huge fan of setting a specific character set in the headers (UTF-8 in particular). This can be done by setting the charset after the MIME-type. So the Content-Type header will look like this:

Content-Type: text/html; charset=utf-8
Content-Type: application/javascript; charset=iso-8859-1
Content-Type: text/css; charset=utf-8
etc.

This is what I would like to do too in my program. However, currently I cannot find any support for setting the charset for plain text-based mime-types. Is it possible to incorporate this feature?

For example:

  1. Save the charset ("utf-8") as a variable const char *charset in proto_data_http
  2. add a variable int set_charset in static_builtin_mime_types[] for plain text-based types (please take a look here for my ideas)
  3. if set_charset is positive, add ; charset=utf=8 to MIME-type (where charset is UTF-8, e.g.)

I would like to hear your feedback on this and of course I would be glad to help programming it.

Thanks in advance,
Hefting

cpq commented

Thank you.
Could you sched a light on what difference setting the charset makes, please?

Of course. First: why is the charset important anyway? To avoid this: "á" --> "á". The problem here is that it is saved in the database correctly as á in charset X and then read as á in charset Y. So if the document is in charset UTF-8, it is very important to let the webbrowser know that.

There are multiple ways to tell the webbrowser the document is in a specific charset. Setting the charset in the Content-Type header is one. Another way is by adding a meta-tag. In HTML4 this would be:

<meta http-equiv="Content-type" content="text/html;charset=UTF-8">

And in HTML5:

<meta charset="UTF-8">

Both this tag and charset in the Content-Type header will be looked at by the browser (not guaranteed, but common practice, for as far as I know). However, this tag is not the first tag in the document and everything BEFORE that tag above is not guaranteed to be parsed as UTF-8.

The Content-Type header is sent in the header and therefore is ALWAYS sent before any content is sent. So that makes it more reliable that the browser will parse the whole document as UTF-8 if you set both.

Hope this explains it better.

mkmik commented

yes, it is true that in 2015 major browsers and major operating systems are still not defaulting to utf-8.

W.R.T the headers or HTML, I'd advise the opposite, i.e. that the meta tag is more important than the headers. Headers can be manipulated by proxies and can be incorrect due to server misconfiguration, and what about local files (file:/// urls).
Furthermore http headers charsets are usually configured per file extension or with a server (or vhost) wide default, without looking at the actual encoding of the file.
Obviously this is a non-issue if you created the files in first place and ensured they all have the same encoding.

The advent of cross domain script security issues is gradually making it more and more difficult to just serve your web site locally without any web server, but that used to be (and will still be) a very useful thing that will encourage people to put and @charset tags to their html and css files respectively.

Another source for this line of reasoning:
http://www.w3.org/International/questions/qa-css-charset
in particular:

The declaration in the HTTP header will always override the in-document declaration, if there is a conflict, except for those browsers where the byte-order mark overrides it.

However, we recommend that if you need to use an HTTP declaration to set the correct encoding, you also include an @charset declaration inside the style sheet. This will ensure that the encoding is still known if the style sheet is used locally or moved, eg. for testing or editing.

Thank you Marko, for that explanation. It was really helpful.

Maybe I wasn't clear, but I was not trying to advise to do only the charset
header. I would like to be able to control both and that is why I asked for
this feature.

Yes, proxies may change headers, but I'm planning to use this inside my
home network, so no proxies. There could indeed be file:///, but there is a
fallback, the way I see it. Also, I am indeed the one who creates the files
and know for myself what charset every file extension is. But my browser
does not, so I would like to tell him. Since I will be doing quite some
internationalization in my webapp I would prefer to do everything I can to
tell the webbrowser what charset the document is, because I personally
think it looks painfully bad if you don't do it correctly.

I agree with you it has some drawbacks and possible security issues, but I
think it is the job of the maintainer to decide whether to set a specific
header or not. As you can see in the previous paragraph I did decide for
this already and even after your explanation I would still like to see this
feature in a future release. However, I do think there could maybe a link
or short description inside the documentation about potential threads and
to encourage users to set both.

So, to rephrase my request: could there maybe be an optional header (one
could use next to the tag) and some documentation? Thanks in advance.

yes, it is true that in 2015 major browsers and major operating systems are
still not defaulting to utf-8.

W.R.T the headers or HTML, I'd advise the opposite, i.e. that the meta tag
is more important than the headers. Headers can be manipulated by proxies
and can be incorrect due to server misconfiguration, and what about local
files (file:/// urls).
Furthermore http headers charsets are usually configured per file extension
or with a server (or vhost) wide default, without looking at the actual
encoding of the file.
Obviously this is a non-issue if you created the files in first place and
ensured they all have the same encoding.

The advent of cross domain script security issues is gradually making it
more and more difficult to just serve your web site locally without any web
server, but that used to be (and will still be) a very useful thing that
will encourage people to put and @charset https://github.com/charset tags
to their html and css files respectively.

Another source for this line of reasoning:
http://www.w3.org/International/questions/qa-css-charset
in particular:

The declaration in the HTTP header will always override the
in-document declaration, if there is a conflict, except for those
browsers where the byte-order mark overrides it.

However, we recommend that if you need to use an HTTP declaration to
set the correct encoding, you also include an @charset declaration
inside the style sheet. This will ensure that the encoding is still
known if the style sheet is used locally or moved, eg. for testing or
editing.


Reply to this email directly or view it on GitHub
#238 (comment).

mkmik commented

Having non ascii characters in my last name (and not using them for the current sad state of things) I can totally relate on the subject of proper character set rendering. I even have a "I � unicode" sticker on my office door.

I'm sure being able to override the charset header has it's own purpose in the mess of web standards, especially when serving plain text files.

Being fossa a lightweight networking platform, we'd like weigh the pro and cons of each feature we include.

That said, I think that we could have a very simple way of letting the users define custom mime type entries. This is a special instance of the general problem of improving the usability of configuration directives, with minimal or no modification of the fossa.h header.

cpq commented

@mmikulicic what's your verdict would be then? do what @Hefting suggests or not?

mkmik commented

my verdict is to implement it, but not with a special support for charsets, but instead a find way to define custom mimetypes.

Currently mime types are defined statically as:

MIME_ENTRY("css", "text/css"),

A user could technically edit the source and either modify it to be e.g. "text/css; charset=utf-8".

Approaches:

  • weak symbols. Let the user define it's own ns_mime_types array and search it before searching static_builtin_mime_types.
  • runtime opts: add it to ns_serve_http_opts.
  • preprocessor hacks: '-DNS_MIME_TYPE_1="css","text/css; charset: utf-8"'
  • config.h and smaller preprocessor hack with multiline block
  • sensible combination of the above

@Hefting @cpq WDYT ?

cpq commented

Mongoose has extra_mime_types feature, where user can define custom mime
type: https://github.com/cesanta/mongoose/blob/master/mongoose.c#L3235
With that, even for existing mime types, user can override them and specify
a charset.
I think this is the way we want to go.

In a nutshell, the task is:

  1. Add an attribute to the struct ns_http_serve_opts, called "char
    *custom_mime_types"
  2. Modify get_mime_type() function to respect that attribute.

On 2 April 2015 at 11:42, Marko Mikulicic notifications@github.com wrote:

my verdict is to implement it, but not with a special support for
charsets, but instead a find way to define custom mimetypes.

Currently mime types are defined statically as:

MIME_ENTRY("css", "text/css"),

A user could technically edit the source and either modify it to be e.g.
"text/css; charset=utf-8".

Approaches:

  • weak symbols. Let the user define it's own ns_mime_types array and
    search it before searching static_builtin_mime_types.
  • runtime opts: add it to ns_serve_http_opts.
  • preprocessor hacks: '-DNS_MIME_TYPE_1="css","text/css; charset:
    utf-8"'
  • config.h and smaller preprocessor hack with multiline block

  • sensible combination of the above

@Hefting https://github.com/Hefting @cpq https://github.com/cpq WDYT ?


Reply to this email directly or view it on GitHub
#238 (comment).

Sergey Lyubka, CTO Cesanta Software
www.cesanta.com sergey.lyubka@cesanta.com

cpq commented

Max, please take this over!
This is actually part of #237. extra_mime_types is a Mongoose feature we want to support in Fossa.

Hello guys,

Sorry for not responding earlier - I've been busy. I totally agree with Marco: we should look at this on a broader level and the option of setting custom mime-types will do the trick. Also following the approach of Mongoose looks indeed very good.

Thank you all for this, good thinking!