Inquiry about Future Plans for Funcodec with Fewer nq Options

Question

Inquiry about Future Plans for Funcodec with Fewer nq Options

hertz-pj opened this issue a year ago · 2 comments

I hope this message finds you well. I am reaching out to commend the exceptional work on Funcodec; it has proven to be a remarkable asset in the community. Currently, I notice that all the available checkpoints are for 32 nq. I am curious to know if there are any plans to release versions with fewer nqs, such as 8 or 12, in the future.
Additionally, I would be interested to learn if there have been any experiments or considerations regarding the impact of a higher number of nqs (like 32) on models similar to valle and whether it affects their performance or efficiency. Your insights on these matters would be greatly appreciated.

Thank you for your dedication to advancing this field. I look forward to your response.

Best regards,

Answer 1 · 2023-12-26T07:54:31.000Z

By adjusting the bit_width parameter, I am able to achieve different nq sizes without a noticeable decline in performance.

Answer 2 · 2023-12-27T03:32:47.000Z

You are right, you can adjust bit_width to obtain different nq sizes. Our released checkpoints are trained with structured dropout [1] on 32 quantizers. In this manner, at the inference stage, one model can work on different bit widthes, indicating different nq sizes. More than 32 quantizers are not evaluated, I believe 32 quantizers are enough for audio signals, more quantizers doesn't take significant performance gain.

[1] Zeghidour N, Luebs A, Omran A, et al. Soundstream: An end-to-end neural audio codec[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 30: 495-507.