NumeriusNegidius/Context-Search

Add the ability to set the character encoding

Closed this issue · 6 comments

Some sites use non-standard encoding (like windows-1251) instead of utf-8, which causes problems when searching. For example: add search from http://www.world-art.ru and try to find 君の名は。
The expected result:
https://i.imgur.com/Qaga3BU.png
What actually happened:
https://i.imgur.com/IGMODMa.png

Do you know if this is a new issue since the release of version 2.0?

It also happens in the previous versions (I have checked 0.9 and 1.1).

It seems that the name of the problem is not entirely accurate. In this case, unicode characters should be converted to a numeric references. That is, it is necessary to replace encodeURIComponent("君の名は。") with encodeURIComponent("君の名は。").
Then the search will work correctly: www.world-art.ru/search.php?public_search=%26%2321531%3B%26%2312398%3B%26%2321517%3B%26%2312399%3B%26%2312290%3B&global_sector=all

Thanks for a good report, good info and investigation! I'll look into it!

Also, this website doesn't search for a string with the Cyrillic symbols coded thus. The way of the utf-8 transformation to windows-1251 is necessary.

As no-one has chimed in on this bug for 2 years, I will assume this is quite an isolated problem. Further, I have no idea where to even begin even if more people were affected.

If somebody wants to have a stab at solving this, please do.

This bug is closed until somebody will bother to attack it or lead me in the right direction.