Ivshti/name-to-imdb

Names with non english characters crashes code: 'ERR_UNESCAPED_CHARACTERS'

prodigy2m opened this issue · 17 comments

When I try to use names that are not English node server crashes.

Example name: Şubat (Netflix title)

I receive this error: code: 'ERR_UNESCAPED_CHARACTERS'

CharacterBug

Can you help? Thanks in advance

@jaruba can you check this out? seems like it shouldn't happen as network requests should handle utf88 just fine

I'm expecting some url encoding issue

The correct request url for Şubat is https://v2.sg.media-imdb.com/suggestion/u/ubat.json
The module is requesting https://sg.media-imdb.com/suggests/ş/%C5%9Eubat.json

First of all it looks like searchTerm.charAt(0).toLowerCase() should be URIEncoded (imdbFind.js) but that is not enough to fix this issue.

I noticed the same things u did, but: https://v2.sg.media-imdb.com/suggestion/u/ubat.json can't be the correct call.. that means it ignores any non-standard utf8 character that is in the front.. so if there would be 3 such characters in the front of the first word it would ignore all of them? That just can't be right...

I tested it searching the string on the IMDB website and checking the network panel

image

Tested this string: ŞŞŞŞŞubaŞŞŞŞŞubatŞŞŞŞŞubatt

And it queries this: https://v2.sg.media-imdb.com/suggestion/u/ubaubat.json

Looks like all non-standard characters are skipped 🤔

If you search only the character: Ş it straight up queries https://v2.sg.media-imdb.com/suggestion//.json and throws an error 😂

LOL!

@duckyb what about if u search for ŞŞŞubat, does it ignore all the first 3 characters? and what if some characters are lowercase? Does it ignore those too? Try with Şșșubat also.

@duckyb what about if u search for ŞŞŞubat, does it ignore all the first 3 characters? and what if some characters are lowercase? Does it ignore those too? Try with Şșșubat also.

Yep, looks like it skips all of them, lower and upper case

Well.. fuck 😆

There is one crazy possible solution that I can think of, that might even be better then IMDB's own implementation. We could convert non-standard characters to standard characters, so Ş becomes S. I remember I implemented something like this in the past for some other needs that I can't remember, so it's possible, and the search results might actually be accurate, but I don't know what will happen for chinese / arab characters, lol.

On IMDB's website, if you search for Şubat you don't even see "Subat (2012)" in the suggestion results, while if you search for "Subat" it becomes the first result.

Probably something to try out but yeah, good luck with non indo-european characters

we could just remove those like imdb does, it's just a question of identifying non-convertable characters

this is not a simple issue to solve though, that's for sure

@jaruba https://www.npmjs.com/package/diacritics

Looks promising.
I'll try it out soon and see what happens

@prodigy2m v3.0.2 is published to NPM with the diacritics fix

Thank you @jaruba and everyone else who helped!