world_language_model example throws UnicodeEncodeError

Question

world_language_model example throws UnicodeEncodeError

miebster opened this issue a year ago · 0 comments

miebster commented a year ago

Your issue may already be reported!
Please search on the issue tracker before creating one.

Context

Python 3.11
Pytorch version: 2.1.1
Operating System and version: Windows 10 Pro, 22H2, 19045.3693

Your Environment

torch installed via "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121"
pytorch_examples cloned from github
Using CUDA/GPU
Which example are you using: word_language_model
Link to code or data to repro [if any]: https://github.com/pytorch/examples/tree/main/word_language_model

Expected Behavior

Following the commands in the world_language_model readme should finish without error.

Current Behavior

During generate.py, an UnicodeEncodeError is thrown when trying to write 'ზ' to the file.

Possible Solution

I resolved the issue by changing line 66 of generate.py

from:
with open(args.outf, 'w') as outf:

to:
with open(args.outf, 'w', encoding="utf-8") as outf:

Steps to Reproduce

cd .\word_language_model
python main.py --cuda --epochs 6
python generate.py

Failure Logs [if any]

| Generated 0/1000 words
| Generated 100/1000 words
| Generated 200/1000 words
| Generated 300/1000 words
| Generated 400/1000 words
| Generated 500/1000 words
Traceback (most recent call last):
File "REDACTED\word_language_model\generate.py", line 83, in
outf.write(word + ('\n' if i % 20 == 19 else ' '))
File "REDACTED\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'charmap' codec can't encode character '\u10d6' in position 0: character maps to