world_language_model example throws UnicodeEncodeError
miebster opened this issue · 0 comments
Your issue may already be reported!
Please search on the issue tracker before creating one.
Context
- Python 3.11
- Pytorch version: 2.1.1
- Operating System and version: Windows 10 Pro, 22H2, 19045.3693
Your Environment
- torch installed via "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121"
- pytorch_examples cloned from github
- Using CUDA/GPU
- Which example are you using: word_language_model
- Link to code or data to repro [if any]: https://github.com/pytorch/examples/tree/main/word_language_model
Expected Behavior
Following the commands in the world_language_model readme should finish without error.
Current Behavior
During generate.py, an UnicodeEncodeError is thrown when trying to write 'ზ' to the file.
Possible Solution
I resolved the issue by changing line 66 of generate.py
from:
with open(args.outf, 'w') as outf:
to:
with open(args.outf, 'w', encoding="utf-8") as outf:
Steps to Reproduce
cd .\word_language_model
python main.py --cuda --epochs 6
python generate.py
Failure Logs [if any]
| Generated 0/1000 words
| Generated 100/1000 words
| Generated 200/1000 words
| Generated 300/1000 words
| Generated 400/1000 words
| Generated 500/1000 words
Traceback (most recent call last):
File "REDACTED\word_language_model\generate.py", line 83, in
outf.write(word + ('\n' if i % 20 == 19 else ' '))
File "REDACTED\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u10d6' in position 0: character maps to