isletennos/MMVC_Trainer

ONNX standalone

Opened this issue · 1 comments

Hi! Thanks for the amazing open source work!

I was looking through onnx_export.py and onnx_bench.py and I was wondering how to run it end to end in a standalone Colab notebook.

Specifically, how do we replace dummy_specs = torch.rand(1, 257, 60) with a mp3/wav audio (of variable time length) converted to a torch Tensor (by rmvpe model? I'm really new to speech model architectures so not sure) with the ONNX converted checkpoint.

Thanks

I believe you are referring to the pitch estimation (rmvpe) and might be looking at the MMVC1.5 branch (v1.5.0.0_SiFiGAN).
In this context, the size of the input tensor is intentionally fixed, rather than dynamic. This is because the input size in the time dimension varies between specs and sin, d0, d1, d2, d3. ONNX cannot dynamically handle such inputs, which is why the size is fixed.