Japanese characters in chapters.txt cause runtime issue
GaProgMan opened this issue · 2 comments
10kft view of error
When Japanese characters are added to the test.chapters.txt
file (where test
is the name of the mp3), and the mp3chaps -i test.mp3
is used, a runtime error happens, and the chapter marker is not added to the mp3 file.
Steps to recreate
- Create a valid mp3 file called
test.mp3
- Create a text file called
test.chapters.txt
- Insert the following line into the text file:
00:00:00.000 I N e e d Y o u 私の側て by G.H
- Run the following command:
mp3chaps -i test.mp3
Full output from error
λ mp3chaps -i test.mp3
Traceback (most recent call last):
File "c:\python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\python39\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Python39\Scripts\mp3chaps.exe\__main__.py", line 7, in <module>
File "c:\python39\lib\site-packages\mp3chaps.py", line 81, in main
add_chapters(tag, args["<filename>"])
File "c:\python39\lib\site-packages\mp3chaps.py", line 45, in add_chapters
chaps = parse_chapters_file(fname)
File "c:\python39\lib\site-packages\mp3chaps.py", line 39, in parse_chapters_file
for line in f.readlines():
File "c:\python39\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 31: character maps to <undefined>
Expected result
test.mp3
should have a single chapter marker called I N e e d Y o u 私の側て by G.H, and running mp3chaps -l test.mp3
should show:
λ mp3chaps -l test.mp3
Chapters:
I N e e d Y o u 私の側て by G.H
Machine details
- Windows 10 (build number: 19041.329)
- Will also test on my Ubuntu 20.04 machine and report back
- mp3chaps vLatest from pip3 (at time of reporting)
- Python 3.9.0b4
Tested on my Ubuntu 20.04 machine, running vLatest of Python 3 and cannot recreate.
I'll continue to use mp3chaps on my Ubuntu machine (daily driver - I hardly ever use my Windows machine).
Hi,
UPDATE: I was not sure if it merits a separate issue so I commented here, but now I see that the error message is a bit different, so perhaps it is a different issue?
UPDATE2:
When the txt file is in Unicode or UTF-8 format, it does not matter if the txt file contains Polish characters or not - the error is the same. So I think the issue is even more serious than I previously thought.
I would like to report a similar issue when trying to add chapters that contain Polish characters like these: ą ć ę ł ń ó ś ź ż. from files encoded in UTF-8 or Unicode, irrespective of having Polish characters or not.
Latest mp3chaps version from pip3
Windows 10 1809
Python 3.8.5
I tried chapter files with characters encoded in two formats - UTF-8 and Unicode using Windows Notepad (I also tried various options in Notepadd++ to no avail). The errors are as following:
Error for Unicode file:
Traceback (most recent call last):
File "c:\users\user\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\user\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\users\user\AppData\Local\Programs\Python\Python38-32\Scripts\mp3chaps.exe\__main__.py", line 7, in <module>
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 81, in main
add_chapters(tag, args["<filename>"])
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 45, in add_chapters
chaps = parse_chapters_file(fname)
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 41, in parse_chapters_file
chaps.append((to_millisecs(time), title))
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 68, in to_millisecs
h, m, s = [float(x) for x in time.split(":")]
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 68, in <listcomp>
h, m, s = [float(x) for x in time.split(":")]
ValueError: could not convert string to float: 'ÿþ0\x000\x00'
Error for UTF-8 file:
Traceback (most recent call last):
File "c:\users\user\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\user\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\users\user\AppData\Local\Programs\Python\Python38-32\Scripts\mp3chaps.exe\__main__.py", line 7, in <module>
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 81, in main
add_chapters(tag, args["<filename>"])
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 45, in add_chapters
chaps = parse_chapters_file(fname)
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 41, in parse_chapters_file
chaps.append((to_millisecs(time), title))
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 68, in to_millisecs
h, m, s = [float(x) for x in time.split(":")]
File "c:\users\user\appdata\local\programs\python\python38-32\lib\site-packages\mp3chaps.py", line 68, in <listcomp>
h, m, s = [float(x) for x in time.split(":")]
ValueError: could not convert string to float: '00'
Result for ansi file: No error (but no Polish characters either)
Lame tag CRC check failed
Chapters:
zazolc gesla jazn
While using trial-and-error method to convert my file to format that would be accepted by mp3chaps while still retaining Polish characters I also randomly got the same error as the one for Japanse, but it was related to badly converted opening and closng quotes and dash/hyphen characters.
File "c:\users\user\appdata\local\programs\python\python38-32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1264: character maps to <undefined>
Sample files are attached. One Polish Any character is enough to cause the error.
I hope this can be fixed, as I looked for free tools to insert chapters to mp3 files and this is the only one that I found and that works (except for that bug). I can convert to ANSI but then I lose all Polish characters, which makes it unusable for my use case.
test.chapters.unicode.txt
test.chapters.utf-8.txt
test.chapters.ansi.txt