Locale middleware issues with whitespaces and case sensitivity
andrykonchin opened this issue · 1 comments
Hi. I noticed there is a divergence between Accepted-Language
header documentation and formats supported by the Locale
middleware. TBH I am not sure which RFC to rely on and any advice is welcome.
I would like to know which changes are acceptable. Also I am happy to provide pull requests with proper fixes.
Whitespaces
The Locale
middleware assumes that the header value contains whitespaces neither between colon separated values nor language tag and quality value:
Accept-Language: ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7,uk;q=0.6
There are numerous examples (MDN, RFC2616) where whitespaces are present between language tags:
Accept-Language: da, en-gb;q=0.8, en;q=0.7
RFC 7231 describes the following header format:
Accept-Language = 1#( language-range [ weight ] )
language-range =
<language-range, see [RFC4647], Section 2.1>
where #
means a comma separated list with optional whitespaces (ABNF #rule extention):
1#element => element *( OWS "," OWS element )
Actually it may not be a problem because both Firefox and Chrome send Accept-Language
without any whitespaces and Safari sends only one language tag. But InternetExplorer 11 sends a list with whitespaces, e.g.
Accept-Language: ru-UA, ru; q=0.8, uk; q=0.6, en-US; q=0.4, en; q=0.2
Whitespaces in quality value
The Locale
middleware assumes the ;q=
prefix doesn't contain whitespaces (and all the browsers looks like follow this convention). But RFC 7231 declares format with optional whitespaces
weight = OWS ";" OWS "q=" qvalue
qvalue = ( "0" [ "." 0*3DIGIT ] )
/ ( "1" [ "." 0*3("0") ] )
so this is a completely correct header value:
Accept-Language: en, en-gb ; q=0.8
Quality value is case insencitive
The Locale
middleware assumes q
name is downcased but according to the RFC 7231 it could be Q
.
Quality value 0
According to RFC 7231 q=0
means "not acceptable" but the Locale
middleware ignores this rule and can use this "not acceptable" locale.
Case insensitive matching
The RFC 4647 declares that language tags should be treated as case-insensitive. I suppose it means that checking whether language tag is included into I18n.available_locales
should be case insensitive as well. But it isn't.
It may not be a problem because it's a common convention to use the downcased language tag (en
) and uppercased subtag (en-US
). But there is a problem with Safari - it sends language wuth downcased subtag - en-us
I agree that all of the issues you've described are valid, and one or more PRs fixing them would be welcomed.