trep/opentrep

UTF8 punctuation characters trigger exceptions in Xapian calls

da115115 opened this issue · 2 comments

When using queries containing UTF8 quote characters, such as for instance 'Thessaloniki “Macedonia” airport', some calls to the Xapian API trigger exceptions. When using the Web site, no answer is returned due to an internal error.

Proposed solution: remove any Unicode quote, accent, punctuation, thanks to the [http://github.com/trep/opentrep/blob/trunk/opentrep/basic/OTransliterator.hpp](OTranslitor class).

The solution has some side effects, which are not necessarily wishable: as the accents are removed, the matching process may be slightly less efficient in some specific cases. For instance, when the user enters 'Côte d'Azur', it will be normalised into 'Cote d'Azur'. However, as all the indexed words are also normalised, it should have no actual consequence.
The only "negative" side effect is that it becomes useless to index the accentuated words.