speechbrain/speechbrain

Issues regarding discrete WavLM and discrete HuBERT

anupsingh15 opened this issue · 4 comments

Describe the bug

Hi,
I am trying to extract tokens using the modules: speechbrain.lobes.models.huggingface_transformers.discrete_wavlm module and speechbrain.lobes.models.huggingface_transformers.discrete_hubert module, but neither seems to work due to missing model checkpoints in the SpeechBrain repo on HF. Could you please let me know of any workaround to get discrete tokens using WavLM/HuBERT?

Expected behaviour

Successful load of pre-trained models

To Reproduce

No response

Environment Details

No response

Relevant Log Output

No response

Additional Context

No response

Hello @anupsingh15, You're right, the Kmeans HF repository is currently inaccessible due to an ongoing refactoring of the interface. A workaround could be to train your own K-means model (see SpeechBrain's LibriSpeech quantization recipe). Alternatively, I have just uploaded a list of pre-trained K-means model to my HF account (repository) that you can use until the new interface is merged.

Thanks @Chaanks. Do you plan to upload KMeans models with 1024 cluster centroids for HuBERT and WavLM? I am training the KMeans models for the same as you suggested; however, I run out of the GPU memory due to limited resources.

Any news @Chaanks on that?

We have uploaded the models with 1000/2000 clusters for different layers in our own repo.. We plan to move all the trained kmeans to the speech \brain repo once the refactoring is done. Here you could find various K-means model trained 👍🏻 https://huggingface.co/poonehmousavi/SSL_Quantization/