thoughtbot/capybara-webkit

Broken charset on page without meta charset

rustamakhmetov opened this issue · 2 comments

Sorry for my English.

I am testing external Cyrillic site without meta charset in the header. The response body contains chars with the broken charset, e.g: "СпиÑ�ок по" of instead "Список покупок".
test code

We try to force the response as UTF-8 in Ruby:

response.force_encoding("UTF-8") if response.respond_to?(:force_encoding)

It's possible that we also need to do something on the QtWebKit side - it may be guessing a different charset. My best guess is that we could use this: http://doc.qt.io/archives/qt-5.5/qwebsettings.html#setDefaultTextEncoding

This behavior is correct, and will happen if you visit a page with your test cases given source in Chrome and Firefox too. When no charset is specified it's up to the browser to pick one, and both Chrome and Firefox (set to US-English, maybe other language versions would default differently) don't default to utf-8 for your given document. You either need to include the meta charset tag, or specify the charset in the Content-Type header returned with the document (or escape all those characters I guess which sounds ridiculous)

'Content-Type' => 'text/html; charset=utf-8'