Hi! LibriSpeech char training!!

Question

Hi! LibriSpeech char training!!

scj0709 opened this issue 4 months ago · 1 comments

Describe the bug

Hello!
I'm really impressed with your code. It's well-structured and a great GitHub repository.
As a result, I'd like to train the LibriSpeech Transformer model following your procedure. However, when I attempted to train it using character tokens by setting the token type to 'char,' I encountered the following error. It seems to be related to padding. image

Expected behaviour

Could you provide any solutions for this issue?

To Reproduce

No response

Environment Details

No response

Relevant Log Output

No response

Additional Context

No response

Answer 1 · 2024-04-10T14:42:15.000Z

Hello @scj0709,

Thanks for opening this issue.

Could you please share with me which YAML you are using to run into this error?

The issue is that the transformer YAMLs that we have in the LibriSpeech folder are using "transformerlm" which has been trained with a SentencePiece BPE tokenizer. We are using the same exact tokenizer, and therefore you cannot change the granularity of your tokenizer.

This is why I'm surprised that you ran into this issue. Do you mind sharing the YAML with me, please?

Thanks and have a great day.