modelscope/FunCodec

Encountering errors when training LauraGPT with codec model configured to num_quantizers=16, codebook_size=512

Closed this issue · 1 comments

Hello and thank you for sharing the amazing work.
I have trained a FunCodec model with num_quantizer=16, codebook_size=512, and want to apply this codec model to the subsequent LauraGPT training. I have modified the codec configuration to the above configuration in the relevant configuration file of LauraGPT, but still encountered the following error when training LauraGPT (stage 5 in egs/LibriTTS/text2speech_laura/run.sh):

 File "FunCodec/funcodec/models/audio_generation/laura_model.py", line 438, in forward
    target_emb = self.calc_dense_vector(codec, codec_lengths)
  File "FunCodec/funcodec/models/audio_generation/laura_model.py", line 350, in calc_dense_vector
    return self.quantizer_codebook(codec, codec_lengths)
  File "anaconda3/envs/pytorch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "FunCodec/funcodec/models/audio_generation/laura_model.py", line 49, in forward
    codec_emb = F.embedding(codec, emb)  # (BT, Nq, D)
  File "anaconda3/envs/pytorch2.0/lib/python3.9/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

After checking, I found that the size of emb is [8192, 128] (same as the setting num_quantizer=16, codebook_size=512), but there are numbers greater than 8192 in the codec (it looks like between 0-16384), which caused the error. This looks like it is caused by setting num_quantizer=16 and codebook_size=1024. I rechecked the codec indices extracted in stage 4 and confirmed that each codec indices is between [0,511]. Can you tell me which configuration may not have changed and caused the error? I guess it is caused by some preprocessing when reading the dataset, but I can't find the specific location. Can you give me some suggestions?

I think I found out.. in line 29 of funcodec/models/audio_generation/laura_model.py, the codec indices shift is calculated using 1024 instead of codebook size. After changing it, it can run normally.