dskrad/mp3chaps

Japanese characters in chapters.txt cause runtime issue

GaProgMan opened this issue · 2 comments

10kft view of error

When Japanese characters are added to the test.chapters.txt file (where test is the name of the mp3), and the mp3chaps -i test.mp3 is used, a runtime error happens, and the chapter marker is not added to the mp3 file.

Steps to recreate

  • Create a valid mp3 file called test.mp3
  • Create a text file called test.chapters.txt
  • Insert the following line into the text file:
    • 00:00:00.000 I N e e d Y o u 私の側て by G.H
  • Run the following command:
    • mp3chaps -i test.mp3

Full output from error

λ mp3chaps -i test.mp3
Traceback (most recent call last):
  File "c:\python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Python39\Scripts\mp3chaps.exe\__main__.py", line 7, in <module>
  File "c:\python39\lib\site-packages\mp3chaps.py", line 81, in main
    add_chapters(tag, args["<filename>"])
  File "c:\python39\lib\site-packages\mp3chaps.py", line 45, in add_chapters
    chaps = parse_chapters_file(fname)
  File "c:\python39\lib\site-packages\mp3chaps.py", line 39, in parse_chapters_file
    for line in f.readlines():
  File "c:\python39\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 31: character maps to <undefined>

Expected result

test.mp3 should have a single chapter marker called I N e e d Y o u 私の側て by G.H, and running mp3chaps -l test.mp3 should show:

λ mp3chaps -l test.mp3
Chapters:
I N e e d Y o u 私の側て by G.H

Machine details

  • Windows 10 (build number: 19041.329)
    • Will also test on my Ubuntu 20.04 machine and report back
  • mp3chaps vLatest from pip3 (at time of reporting)
  • Python 3.9.0b4

Tested on my Ubuntu 20.04 machine, running vLatest of Python 3 and cannot recreate.

I'll continue to use mp3chaps on my Ubuntu machine (daily driver - I hardly ever use my Windows machine).

Hi,

UPDATE: I was not sure if it merits a separate issue so I commented here, but now I see that the error message is a bit different, so perhaps it is a different issue?

UPDATE2:
When the txt file is in Unicode or UTF-8 format, it does not matter if the txt file contains Polish characters or not - the error is the same. So I think the issue is even more serious than I previously thought.

I would like to report a similar issue when trying to add chapters that contain Polish characters like these: ą ć ę ł ń ó ś ź ż. from files encoded in UTF-8 or Unicode, irrespective of having Polish characters or not.

Latest mp3chaps version from pip3
Windows 10 1809
Python 3.8.5

I tried chapter files with characters encoded in two formats - UTF-8 and Unicode using Windows Notepad (I also tried various options in Notepadd++ to no avail). The errors are as following:

Error for Unicode file:

Traceback (most recent call last):
  File "c:\users\user\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\user\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\users\user\AppData\Local\Programs\Python\Python38-32\Scripts\mp3chaps.exe\__main__.py", line 7, in <module>
  File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 81, in main
    add_chapters(tag, args["<filename>"])
  File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 45, in add_chapters
    chaps = parse_chapters_file(fname)
  File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 41, in parse_chapters_file
    chaps.append((to_millisecs(time), title))
  File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 68, in to_millisecs
    h, m, s = [float(x) for x in time.split(":")]
  File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 68, in <listcomp>
    h, m, s = [float(x) for x in time.split(":")]
ValueError: could not convert string to float: 'ÿþ0\x000\x00'

Error for UTF-8 file:

Traceback (most recent call last):
  File "c:\users\user\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\user\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\users\user\AppData\Local\Programs\Python\Python38-32\Scripts\mp3chaps.exe\__main__.py", line 7, in <module>
  File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 81, in main
    add_chapters(tag, args["<filename>"])
  File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 45, in add_chapters
    chaps = parse_chapters_file(fname)
  File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 41, in parse_chapters_file
    chaps.append((to_millisecs(time), title))
  File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 68, in to_millisecs
    h, m, s = [float(x) for x in time.split(":")]
  File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 68, in <listcomp>
    h, m, s = [float(x) for x in time.split(":")]
ValueError: could not convert string to float: '00'

Result for ansi file: No error (but no Polish characters either)

Lame tag CRC check failed
Chapters:
zazolc gesla jazn

While using trial-and-error method to convert my file to format that would be accepted by mp3chaps while still retaining Polish characters I also randomly got the same error as the one for Japanse, but it was related to badly converted opening and closng quotes and dash/hyphen characters.

  File "c:\users\user\appdata\local\programs\python\python38-32\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1264: character maps to <undefined>

Sample files are attached. One Polish Any character is enough to cause the error.
I hope this can be fixed, as I looked for free tools to insert chapters to mp3 files and this is the only one that I found and that works (except for that bug). I can convert to ANSI but then I lose all Polish characters, which makes it unusable for my use case.

test.chapters.unicode.txt
test.chapters.utf-8.txt
test.chapters.ansi.txt