Names with non english characters crashes code: 'ERR_UNESCAPED_CHARACTERS'
prodigy2m opened this issue · 17 comments
Can you help? Thanks in advance
@jaruba can you check this out? seems like it shouldn't happen as network requests should handle utf88 just fine
I'm expecting some url encoding issue
The correct request url for Şubat
is https://v2.sg.media-imdb.com/suggestion/u/ubat.json
The module is requesting https://sg.media-imdb.com/suggests/ş/%C5%9Eubat.json
First of all it looks like searchTerm.charAt(0).toLowerCase()
should be URIEncoded (imdbFind.js) but that is not enough to fix this issue.
I noticed the same things u did, but: https://v2.sg.media-imdb.com/suggestion/u/ubat.json
can't be the correct call.. that means it ignores any non-standard utf8 character that is in the front.. so if there would be 3 such characters in the front of the first word it would ignore all of them? That just can't be right...
Tested this string: ŞŞŞŞŞubaŞŞŞŞŞubatŞŞŞŞŞubatt
And it queries this: https://v2.sg.media-imdb.com/suggestion/u/ubaubat.json
Looks like all non-standard characters are skipped 🤔
If you search only the character: Ş
it straight up queries https://v2.sg.media-imdb.com/suggestion//.json
and throws an error 😂
LOL!
@duckyb what about if u search for ŞŞŞubat
, does it ignore all the first 3 characters? and what if some characters are lowercase? Does it ignore those too? Try with Şșșubat
also.
@duckyb what about if u search for
ŞŞŞubat
, does it ignore all the first 3 characters? and what if some characters are lowercase? Does it ignore those too? Try withŞșșubat
also.
Yep, looks like it skips all of them, lower and upper case
Well.. fuck 😆
There is one crazy possible solution that I can think of, that might even be better then IMDB's own implementation. We could convert non-standard characters to standard characters, so Ş becomes S. I remember I implemented something like this in the past for some other needs that I can't remember, so it's possible, and the search results might actually be accurate, but I don't know what will happen for chinese / arab characters, lol.
On IMDB's website, if you search for Şubat
you don't even see "Subat (2012)" in the suggestion results, while if you search for "Subat" it becomes the first result.
Probably something to try out but yeah, good luck with non indo-european characters
we could just remove those like imdb does, it's just a question of identifying non-convertable characters
this is not a simple issue to solve though, that's for sure
@jaruba https://www.npmjs.com/package/diacritics
Looks promising.
I'll try it out soon and see what happens
@prodigy2m v3.0.2 is published to NPM with the diacritics fix