keras-team/keras-preprocessing

BUG: Byte like object is expected not a dict error

andreamoro opened this issue · 0 comments

Have been trying to read a bunch of texts lines from a txt file and use the the text_to_word_sequence unsuccessfully using a Colab (which runs the text prepocessing 1.1.0)

from keras.preprocessing.text import Tokenizer

lines_dataset = tf.data.TextLineDataset(CSV_PATH)
k_vocabulary_set = set()

for text_tensor in lines_dataset:
  print(text_tensor)
  print(type(text_tensor.numpy()))
  print(keras.preprocessing.text.text_to_word_sequence(text_tensor.numpy()))
  print()
  break

Output

tf.Tensor(b'free game', shape=(), dtype=string)
<class 'bytes'>
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-71-04ae0d1bc4ad> in <module>()
      6   print(text_tensor)
      7   print(type(text_tensor.numpy()))
----> 8   print(keras.preprocessing.text.text_to_word_sequence(text_tensor.numpy()))
      9   print()
     10 

/usr/local/lib/python3.6/dist-packages/keras_preprocessing/text.py in text_to_word_sequence(text, filters, lower, split)
     56                 text = text.replace(c, split)
     57     else:
---> 58         translate_dict = {c: split for c in filters}
     59         translate_map = maketrans(translate_dict)
     60         text = text.translate(translate_map)

TypeError: a bytes-like object is required, not 'dict'