dennybritz/chatbot-retrieval

UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence

Kiteflyingee opened this issue · 1 comments

use ubuntu data, when load vocabulary.txt, helpers.py throws UnicodeDecodeError.

File "D:\CDisk\Documents\GitHub\chatbot-retrieval\udc_model.py", line 40, in model_fn
targets)
File "D:\CDisk\Documents\GitHub\chatbot-retrieval\models\dual_encoder.py", line 33, in dual_encoder_model
embeddings_W = get_embeddings(hparams)
File "D:\CDisk\Documents\GitHub\chatbot-retrieval\models\dual_encoder.py", line 10, in get_embeddings
vocab_array, vocab_dict = helpers.load_vocab(hparams.vocab_path)
File "D:\CDisk\Documents\GitHub\chatbot-retrieval\models\helpers.py", line 9, in load_vocab
vocab = f.read().splitlines()
UnicodeDecodeError: 'gbk' codec can't decode byte 0xbf in position 2: illegal multibyte sequence

for windows, add encoding="utf-8" can fix this problem