Comparing text with Unicode characters

Question

Comparing text with Unicode characters

MiattoRocha opened this issue 6 years ago · 4 comments

Hello people.

I'm designing a system and want to use SearchJS to my advanced search solution, but I'm having trouble with comparing strings with accents, since I'm working to a Latin company, our first users will be using the website in Portuguese. (é; ê; ã; á; â; à; õ; ó; ô; ç; etc)
Using the Text option as true, would be nice to have a accent folding.
I was looking for it online and found an option, javascript has the String.prototype.toLocaleLowerCase(), using it instead could be a solution to i18n the SearchJS.

What do you think about this?

Answer 1 · 2019-01-07T13:32:18.000Z

I like the idea, although we should leave it as an option. Some may want to match only if the diacritics match.

javascript has the String.prototype.toLocaleLowerCase(), using it instead could be a solution to i18n the SearchJS

I don't think that does it. My little bit of experience with it shows that it keeps the accents.

When you say "accent folding", you mean that, e.g. any of àáâãäå would match a, etc.?

Answer 2 · 2019-01-07T13:32:59.000Z

There is a good sample here (for reference)

Answer 3 · 2019-01-07T13:52:08.000Z

Yeah, I'm talking about match strings with same base character, exactly like you àáâãäå example.
I see, toLocaleLowerCase isn't a way to solve this.

That's is a good sample, but what did you think is best, implement a solution like that or use a lib to handle this replace?
Using a small but solid lib could avoid maintenance of adding a new letter every time someone need.

Answer 4 · 2019-01-09T18:25:05.000Z

Using a small but solid lib could avoid maintenance of adding a new letter every time someone need.

As long as we could package it in. This does work in the browser now, and I wouldn't want to lose that ability.

The sheer number of languages with diacritical marks would make it a maintenance challenge. I speak Hebrew fluently, and the same thing exists (fortunately, most people write it without the additional accent characters).

Do you know of any good libs to use?