Broken charset on page without meta charset
rustamakhmetov opened this issue · 2 comments
Sorry for my English.
I am testing external Cyrillic site without meta charset in the header. The response body contains chars with the broken charset, e.g: "СпиÑ�ок по" of instead "Список покупок".
test code
We try to force the response as UTF-8 in Ruby:
It's possible that we also need to do something on the QtWebKit side - it may be guessing a different charset. My best guess is that we could use this: http://doc.qt.io/archives/qt-5.5/qwebsettings.html#setDefaultTextEncoding
This behavior is correct, and will happen if you visit a page with your test cases given source in Chrome and Firefox too. When no charset is specified it's up to the browser to pick one, and both Chrome and Firefox (set to US-English, maybe other language versions would default differently) don't default to utf-8 for your given document. You either need to include the meta charset tag, or specify the charset in the Content-Type header returned with the document (or escape all those characters I guess which sounds ridiculous)
'Content-Type' => 'text/html; charset=utf-8'