Altered parsed name containing unicode character ł

Question

Altered parsed name containing unicode character ł

Closed this issue 7 years ago · 3 comments

Hi there,

While trying to parse a name with an unicode character (ex: M. Test-ł), I saw that the result is altered (M. Test-▒).

After checking the code, I manage to find that the alteration come from this line.
As the strtolower function does not manage multi-bytes characters, this explain the alteration. Replacing the strtolower by mb_strtolower solve the case.

So I'm wondering if there is an interest to edit the code to replace the simple string functions by the multi bytes version of them to make the parser suitable for international names ?

Thanks.

Answer 1 · 2017-02-03T18:50:35.000Z

Great catch. Would you mind updating it and sending me a pull request?

Answer 2 · 2017-02-04T08:08:53.000Z

Ok, I'll take a look at this during the next days.

I saw that you have the same library for JavaScript, I don't know if there is the same issue with it. Unfortunatelly I think I might not be helpfull for the JS one.

Answer 3 · 2017-02-07T07:31:21.000Z

Many thanks for the merge.

And also many thanks for the parser 👍, it help me a lot since I discovered that split names with byt white space will not be adequate.