The token vector should be one-hot encoded.

Question

The token vector should be one-hot encoded.

Closed this issue 4 years ago · 5 comments

is it necessary to use one hot encoding or we can use tf.keras.preprocessing.text.Tokenizer for encoding?

Answer 1 · 2020-06-21T06:24:40.000Z

Maybe it could work, I think you need to specify how you would use it for me to be able to make a qualified guess on that :)
In the paper they let the decoder emit a single character per time step, thats why I implemented it that way.

Answer 2 · 2020-06-21T07:09:23.000Z

@hgstudent can you please share preprocessing code too?

Answer 3 · 2020-06-28T12:50:19.000Z

I don't have any general preprocessing code atm. However, depending on what you mean with preprocessing, I could point you towards a repository if you would like too though :)

Answer 4 · 2020-06-28T12:59:33.000Z

yes please
i am new to this feild and its very hard for me to learn all this
if you know any repository with complete speech recognition code from preprocessing to prediction please give me link
i was waiting for your reply from many days

Answer 5 · 2020-06-28T14:21:43.000Z

I dont know of any complete speech recognition but https://github.com/DemisEom/SpecAugment is good for preprocessing along with augmentation :)