Error: UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 3183: character maps to <undefined>
avfirsov opened this issue · 1 comments
avfirsov commented
I am trying to split following md Makdyeniyel_M._Zapomnit_Vsyo_Usvoenie_Zn.a6.md and get an error:
$ python3 -m split_markdown4gpt ~/Downloads/Makdyeniyel_M._Zapomnit_Vsyo_Usvoenie_Zn.a6/Makdyeniyel_M._Zapomnit_Vsyo_Usvoenie_Zn.a6.md --model gpt-3.5-turbo --limit 4096 --separator "=== SPLIT ==="
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\curious_andrew\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\split_markdown4gpt\__main__.py", line 44, in <module>
cli()
File "C:\Users\curious_andrew\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\split_markdown4gpt\__main__.py", line 40, in cli
fire.Fire(split_md_file)
File "C:\Users\curious_andrew\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\fire\core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\curious_andrew\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\fire\core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\curious_andrew\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\fire\core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\curious_andrew\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\split_markdown4gpt\__main__.py", line 34, in split_md_file
return f"\n{separator}\n".join(md_splitter.split(md_path))
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\curious_andrew\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\split_markdown4gpt\splitter.py", line 372, in split
self.load_md(md)
File "C:\Users\curious_andrew\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\split_markdown4gpt\splitter.py", line 121, in load_md
self.load_md_path(md)
File "C:\Users\curious_andrew\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\split_markdown4gpt\splitter.py", line 91, in load_md_path
self.load_md_file(md_file)
File "C:\Users\curious_andrew\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.12_qbz5n2kfra8p0\LocalCache\local-packages\Python312\site-packages\split_markdown4gpt\splitter.py", line 100, in load_md_file
self.load_md_str(md_file.read())
^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.496.0_x64__qbz5n2kfra8p0\Lib\encodings\cp1251.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 3183: character maps to <undefined>
What can I be doing wrong?
twardoch commented
well, now finally it's the tool that might be at fault, I'll try to take a look