Wintermute0110/plugin.program.AEL.dev

Unicode characters in rom filenames cause scraper failures

Closed this issue · 2 comments

It seems that having a unicode character in the filename of a rom causes the scraper (occurs when using GamesDB, but should apply to all scrapers) to fail with the following stack trace:

1:44:56.477 T:139951996065536 ERROR: /usr/lib/python2.7/urllib.py:1298: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal return ''.join(map(quoter, s)) 11:44:56.477 T:139951996065536 ERROR: AEL ERROR: SingleInstance::__exit__() Unhandled excepcion in protected code 11:44:56.483 T:139951996065536 ERROR: EXCEPTION Thrown (PythonToCppException) : -->Python callback/script returned the following error<-- - NOTE: IGNORING THIS CAN LEAD TO MEMORY LEAKS! Error Type: <type 'exceptions.KeyError'> Error Contents: u'\xe8' Traceback (most recent call last): File "/home/loucipher/.kodi/addons/plugin.program.advanced.emulator.launcher/addon.py", line 39, in <module> main.Main().run_plugin() File "/home/loucipher/.kodi/addons/plugin.program.advanced.emulator.launcher/resources/main.py", line 270, in run_plugin self.run_protected(command, args) File "/home/loucipher/.kodi/addons/plugin.program.advanced.emulator.launcher/resources/main.py", line 363, in run_protected self._command_add_roms(args['launID'][0]) File "/home/loucipher/.kodi/addons/plugin.program.advanced.emulator.launcher/resources/main.py", line 2401, in _command_add_roms self._roms_import_roms(launcher) File "/home/loucipher/.kodi/addons/plugin.program.advanced.emulator.launcher/resources/main.py", line 8809, in _roms_import_roms romdata = self._roms_process_scanned_ROM(launcherID, ROM) File "/home/loucipher/.kodi/addons/plugin.program.advanced.emulator.launcher/resources/main.py", line 8972, in _roms_process_scanned_ROM results = self.scraper_data.get_search(rom_name_scraping, ROM.getBase_noext(), platform) File "/home/loucipher/.kodi/addons/plugin.program.advanced.emulator.launcher/resources/scrap_metadata.py", line 203, in get_search return Scraper_TheGamesDB.get_search(self, search_string, rom_base_noext, platform) File "/home/loucipher/.kodi/addons/plugin.program.advanced.emulator.launcher/resources/scrap_common.py", line 74, in get_search 'name=' + urllib.quote_plus(search_string) + '&platform=' + urllib.quote_plus(scraper_platform) File "/usr/lib/python2.7/urllib.py", line 1303, in quote_plus s = quote(s, safe + ' ') File "/usr/lib/python2.7/urllib.py", line 1298, in quote return ''.join(map(quoter, s)) KeyError: u'\xe8' -->End of Python script error report<--

This is probably because the OS (this is on 64 bit debian sid) is returning the filename with a different character encoding than what AEL expects/uses for all its strings (probably defaults to ascii?), causing urllib.quote() to fail. I'm personally of the opinion that the best solution to this is to use utf-8 for all strings, converting whatever filename strings the OS returns from ascii to utf-8 as necessary, thereby allowing unicode characters in filenames (the example filename that was crashing my scrapes is 'Andrè Agassi Tennis (U).bin'). But, that's my initial gut reaction, it could well be that GamesDB doesn't support unicode, in which case the solution is to either convert all utf-8 strings to ascii, or we just have to settle for renaming rom files to ascii-only strings.

(I do realize that the example filename I gave is wrong, turns out his name is properly spelled Andre and not Andrè, but it does expose what could be an issue with, for example, import roms that have Japanese characters in their name. I had the same issue with 720 Degrees for the NES, my rom was named 720°)

And on a related note, it would be nice if the scraper would fail gracefully and have a resume/scrape missing option. Right now, if you hit this error, the scraper bombs out, and you have no roms in your launcher. It would be a nice feature if it would write out the xml for what it's already scraped (I'm planning to make a branch to implement this, time permitting), and allow you to resume scanning the folder (unless there's a way to do this now and I'm missing it). I was lucky that my errors happened alphabetically early, this would suck had I already scraped 99% of my library and one failure caused me to have to re-scrape the entire thing!

Love this plugin, it's completely replaced the venerable romcollectionbrowser for me!

Thanks a lot for reporting. Your filesystem is fine, the bug is caused by some programming fault dealing with Unicode strings, I will fix it ASAP.

The bug seems to affect the metadata online scraper. As a workaround, try to use the offline scraper only.

Your can "resume" your launcher scanning in case there is a crash. Metadata scraper: if you are using the offline scraper it is as fast as reading the NFO files, so do nothing. If you are using an online scraper then set the option "Update NFO files after ROM scraping" and the Metadata scan policy to "NFO Files + Scrapers". When you rescan your launcher again after a failure, previous scraped metadata is stored in the NFO files and won't be rescraped again. Artwork: set the policy to "Local Images + Scrapers". If the artwork were scraped previously it will be locally available and picked up before being scraped again.

Please try AEL branch release-0.9.8, this issue should be fixed there.

I will close the issue now. Reopen if problem persists.