fishaudio/fish-speech

Which will be better? with indices or with hiddens?

Closed this issue · 1 comments

Hi, Is there any experiments about LLM training speech input? there are two kind of inputs: the indices of codebook in codec, as a singel integer value, or the indexed cluster center of codebook as a vector. Is there any study to say which one can better fit the AutoRegressive LLM model training?

Move to Discussion.