heatz123/naturalspeech

about cuda of dtw

Opened this issue · 3 comments

hdmjdp commented

sdtw = SoftDTW(use_cuda=False, gamma=0.01, warp=134.4)

why use cuda=False?

Hi @hdmjdp,
The current implementation of softdtw without CUDA using parallel loops using numba and I think this can be efficient enough. Additionally, due to the high memory usage of matrices for softdtw, it may be required to set a lower batch size, which may reduce the potential advantages of using CUDA. And there seems to be some numerical instability when performing backward computations in the CUDA sdtw implementation, particularly when dealing with long input sequences (refer to https://github.com/Maghoumi/pytorch-softdtw-cuda), which is in our case.

However, if you want the potential speedup of using CUDA, you can modify the models/sdtw_cuda.py file to appropriately handle the warp penalty in the forward and backward function, like as in the no-cuda implementation.

Thanks.

hdmjdp commented

@heatz123 Thanks,I just train it to 145k without dwt loss, then finetue it on the 145k checkpoint. And there is some error, "torch.arange(0, max_len)
RuntimeError: upper bound and larger bound inconsistent with step sign
"
this error is caused by max_len=negative value.

hi @hdmjdp,
I've checked the implementation, but it is not clear how max_len could be negative, as max_len is set to max value of the durations(>0) in a batch. Could you please provide more details, such as the config settings you are using and any steps for reproducing the issue? Additionally, it would be helpful if you could print the value of max_len to figure out the issue.

Thank you.