mquinson/po4a

Accept-Languages + Accept on website returns incorrect languages

Opened this issue · 3 comments

Hello,

I sent this message to the mailing list, but it bounced because:

[<devel@po4a.org>](mailto:devel@po4a.org): host mx1.osci.io[8.43.85.229] said: 554 5.7.1
    [<devel@po4a.org>](mailto:devel@po4a.org): Relay access denied (in reply to RCPT TO command)

So I'm filing this on your github.

I have an issue with the Accept-Language and the Accept header on the website po4a.org. For some reason the website responds with the Dutch language. And while I'm a native speaker, the Accept-Language of the browser isn't accepted. I've tested this across five browsers/user agents:

  • Chrome versions: 122.0.6261.128-1, 123.0.6312.28-1 and 124.0.6342.3-1
  • Firefox: 125.0a1~20240313094814
  • Curl: 8.6.0-3.2

TL;DR
Not all languages have text/html content type as seen in the Alternates header. Instead they use application/x-httpd-php, which aren't in the requested language (or in the preferred Accept). This affects the following languages: English, German, French, Japanese(?) (ja), Portuguese (both regular and Brazilian), and Chinese.

This is correct:

curl -H 'Accept-Language: en-US,en;q=0.9,nl;q=0.8' -H 'Accept: *' -v -o /dev/null https://po4a.org/ 2>&1 | grep Lang
> Accept-Language: en-US,en;q=0.9,nl;q=0.8
< Content-Language: en

This goes wrong:

curl -H 'Accept-Language: en-US,en;q=0.9,nl;q=0.8' -H 'Accept: Accept: text/html' -v -o /dev/null https://po4a.org/ 2>&1 | grep Lang
> Accept-Language: en-US,en;q=0.9,nl;q=0.8
< Content-Language: nl

The behaviour changes when we add German.

Correct:

curl -H 'Accept-Language: en-US,en;q=0.9,de;q=1,nl;q=0.7' -H 'Accept: *' -v -o /dev/null https://po4a.org/ 2>&1 | grep Lang
> Accept-Language: en-US,en;q=0.9,de;q=1,nl;q=0.7
< Content-Language: de

Incorrect:

curl -H 'Accept-Language: en-US,en;q=0.9,de;q=1,nl;q=0.7' -H 'Accept: text/html' -v -o /dev/null https://po4a.org/ 2>&1 | grep Lang
> Accept-Language: en-US,en;q=0.9,de;q=1,nl;q=0.7
< Content-Language: nl

This is what intrigued me as German with an Accept: * is accepted, so I dug a little deeper:

$ curl -H 'Accept-Language: en-US,en;q=0.9,de;q=1' -H 'Accept: text/html' -v -o /dev/null https://po4a.org 2>&1 -o /dev/null

This returns a 406 HTTP code but adds Alternates to the response:

Alternates: {"index.php.de" 1 {type application/x-httpd-php} {language de}}, {"index.php.en" 1 {type application/x-httpd-php} {language en}}, {"index.php.eo" 1 {type text/html} {language eo}}, {"index.php.es" 1 {type application/x-httpd-php} {languagees}}, {"index.php.fr" 1 {type application/x-httpd-php} {language fr}}, {"index.php.hr" 1 {type text/html} {language hr}}, {"index.php.hu" 1 {type text/html} {language hu}}, {"index.php.it" 1 {type text/html} {language it}}, {"index.php.ja" 1 {type application/x-httpd-php} {language ja}}, {"index.php.nl" 1 {type text/html} {language nl}}, {"index.php.pt" 1 {type application/x-httpd-php} {language pt}}, {"index.php.pt_BR" 1 {type application/x-httpd-php} {language pt-br} {length 8039}}, {"index.php.ru" 1{type application/x-httpd-php} {language ru}}, {"index.php.uk" 1 {type text/html} {language uk}}, {"index.php.zh_CN" 1 {type application/x-httpd-php} {language zh-cn} {length 7312}}

Now when we request it with the application/x-httpd-php content type:

curl -H 'Accept-Language: en-US,en;q=0.9,de;q=1' -H 'Accept: application/x-httpd-php,text/html' -v -o /dev/null https://po4a.org 2>&1 | grep -E 'Lang|Alternates'
> Accept-Language: en-US,en;q=0.9,de;q=1
< Content-Language: de

We get the correct language.

Conclusion:
Not all languages have text/html but instead use application/x-httpd-php, which aren't in the requested language (or in the preferred Accept). It seems to affect at least English, German, French, Japanese (ja), Portuguese (both regular and Brazilian), and Chinese. Could you change the content type for the affected languages? Most browsers (and thus users) would benefit from this change.

Many thanks!
Wesley

+1 for this, please check all languages, it affects more than those outlined by @waterkip.
If I'm reading https://po4a.org/man/man1/po4a.1.php in English and click on a footer link to change it to sr_Cyrl in Edge 125.0.2535.92, the browser tries to download the file po4a.1.php.sr_Cyrl.

We need to completely redo the website. We need a more usual authoring solution allowing source code in markdown or asciidoc instead of html. And we need a way to automatically push the git commits to the webpage.

Any help is welcome, as I don't find the time to do these rather easy tasks. I personnally don't really care about the exact solution we pick. It just needs to do its work.

Asciidoc + git ... great idea ... I'll try to see to it ... no promises though