Encountering errors when training LauraGPT with codec model configured to num_quantizers=16, codebook_size=512
Closed this issue · 1 comments
Hello and thank you for sharing the amazing work.
I have trained a FunCodec model with num_quantizer=16, codebook_size=512
, and want to apply this codec model to the subsequent LauraGPT training. I have modified the codec configuration to the above configuration in the relevant configuration file of LauraGPT, but still encountered the following error when training LauraGPT (stage 5
in egs/LibriTTS/text2speech_laura/run.sh
):
File "FunCodec/funcodec/models/audio_generation/laura_model.py", line 438, in forward
target_emb = self.calc_dense_vector(codec, codec_lengths)
File "FunCodec/funcodec/models/audio_generation/laura_model.py", line 350, in calc_dense_vector
return self.quantizer_codebook(codec, codec_lengths)
File "anaconda3/envs/pytorch2.0/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "FunCodec/funcodec/models/audio_generation/laura_model.py", line 49, in forward
codec_emb = F.embedding(codec, emb) # (BT, Nq, D)
File "anaconda3/envs/pytorch2.0/lib/python3.9/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
After checking, I found that the size of emb
is [8192, 128]
(same as the setting num_quantizer=16, codebook_size=512
), but there are numbers greater than 8192 in the codec
(it looks like between 0-16384), which caused the error. This looks like it is caused by setting num_quantizer=16 and codebook_size=1024
. I rechecked the codec indices extracted in stage 4
and confirmed that each codec indices is between [0,511]
. Can you tell me which configuration may not have changed and caused the error? I guess it is caused by some preprocessing when reading the dataset, but I can't find the specific location. Can you give me some suggestions?
I think I found out.. in line 29
of funcodec/models/audio_generation/laura_model.py
, the codec indices shift is calculated using 1024
instead of codebook size
. After changing it, it can run normally.