rack/rack-contrib

Locale middleware issues with whitespaces and case sensitivity

andrykonchin opened this issue · 1 comments

Hi. I noticed there is a divergence between Accepted-Language header documentation and formats supported by the Locale middleware. TBH I am not sure which RFC to rely on and any advice is welcome.

I would like to know which changes are acceptable. Also I am happy to provide pull requests with proper fixes.

Whitespaces

The Locale middleware assumes that the header value contains whitespaces neither between colon separated values nor language tag and quality value:

Accept-Language: ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7,uk;q=0.6

There are numerous examples (MDN, RFC2616) where whitespaces are present between language tags:

Accept-Language: da, en-gb;q=0.8, en;q=0.7

RFC 7231 describes the following header format:

     Accept-Language = 1#( language-range [ weight ] )
     language-range  =
               <language-range, see [RFC4647], Section 2.1>

where # means a comma separated list with optional whitespaces (ABNF #rule extention):

1#element => element *( OWS "," OWS element )

Actually it may not be a problem because both Firefox and Chrome send Accept-Language without any whitespaces and Safari sends only one language tag. But InternetExplorer 11 sends a list with whitespaces, e.g.

Accept-Language: ru-UA, ru; q=0.8, uk; q=0.6, en-US; q=0.4, en; q=0.2

Whitespaces in quality value

The Locale middleware assumes the ;q= prefix doesn't contain whitespaces (and all the browsers looks like follow this convention). But RFC 7231 declares format with optional whitespaces

     weight = OWS ";" OWS "q=" qvalue
     qvalue = ( "0" [ "." 0*3DIGIT ] )
            / ( "1" [ "." 0*3("0") ] )

so this is a completely correct header value:

Accept-Language: en, en-gb ; q=0.8

Quality value is case insencitive

The Locale middleware assumes q name is downcased but according to the RFC 7231 it could be Q.

Quality value 0

According to RFC 7231 q=0 means "not acceptable" but the Locale middleware ignores this rule and can use this "not acceptable" locale.

Case insensitive matching

The RFC 4647 declares that language tags should be treated as case-insensitive. I suppose it means that checking whether language tag is included into I18n.available_locales should be case insensitive as well. But it isn't.

It may not be a problem because it's a common convention to use the downcased language tag (en) and uppercased subtag (en-US). But there is a problem with Safari - it sends language wuth downcased subtag - en-us

I agree that all of the issues you've described are valid, and one or more PRs fixing them would be welcomed.