chiphuyen/stanford-tensorflow-tutorials

Lecture 11: 11_char_rnn ... 'charmap' codec can't decode byte 0x81 in position 170 ...

terminsen opened this issue · 2 comments

Hi -

I get the below traceback ... can you help with this one, please ?

Kind regards, Jesper.


UnicodeDecodeError Traceback (most recent call last)
in ()
148
149 if name == 'main':
--> 150 main()

in main()
145 lm = CharRNN(model)
146 lm.create_model()
--> 147 lm.train()
148
149 if name == 'main':

in train(self)
106 data = read_batch(stream, self.batch_size)
107 while True:
--> 108 batch = next(data)
109
110 # for batch in read_batch(read_data(DATA_PATH, vocab)):

in read_batch(stream, batch_size)
38 def read_batch(stream, batch_size):
39 batch = []
---> 40 for element in stream:
41 batch.append(element)
42 if len(batch) == batch_size:

in read_data(filename, vocab, window, overlap)
25
26 def read_data(filename, vocab, window, overlap):
---> 27 lines = [line.strip() for line in open(filename, 'r').readlines()]
28 while True:
29 random.shuffle(lines)

~\Anaconda3\lib\encodings\cp1252.py in decode(self, input, final)
21 class IncrementalDecoder(codecs.IncrementalDecoder):
22 def decode(self, input, final=False):
---> 23 return codecs.charmap_decode(input,self.errors,decoding_table)[0]
24
25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 170: character maps to

This is the file encoding issue
Change line 27 to:
lines = [line.strip() for line in open(filename, 'r', encoding="utf-8").readlines()]

Thank you