jishengpeng/WavTokenizer

SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling

PythonMIT

Pinned issues

Questions about more detailed experimental results

#2 opened 4 months ago by hbwu-ntu

Open2

WavTokenizer-mdium is release on 2024.09.09

#23 opened 4 months ago by jishengpeng

Open4

About ASR

#6 opened 4 months ago by wntg

Open4

Issues

Training time
#63 opened 21 days ago by Ming-er
1
Question about the validation set
#62 opened a month ago by alberthli
6
The loss value when the model converges
#20 opened 4 months ago by yangyyt
10
重建效果是否符合预期
#61 opened a month ago by zzchust
3
Performance in LLM-based-TTS
#40 opened 4 months ago by Liujingxiu23
4
# audio_tokens错误问题
#60 opened a month ago by EdisonZhu33
1
What is the expected behaviour with changing the bandwidth parameter
#51 opened 2 months ago by tanmaylaud
1
We update WavTokenizer paper in Arxiv and release WavTokenizer-Large checkpoint in Huggingface on 2024.10.22
#45 opened 3 months ago by jishengpeng
5
how's the quality compar e with other audio tokenizer?
#59 opened 2 months ago by MonolithFoundation
3
Duplicate discriminator in dac?
#58 opened 2 months ago by npuichigo
4
question about streaming infer
#56 opened 2 months ago by VJJJJJJ1
2
Usage for speech separation and temporal audio features
#29 opened 4 months ago by saveriyo
7
some questions about model
#49 opened 2 months ago by VJJJJJJ1
1
Traning on wenetspeech couldn‘t converge
#28 opened 4 months ago by dyyoungg
3
worse performance of large model compared to small model?
#54 opened 2 months ago by XiaoshanHsj
4
Semantic Representation
#55 opened 2 months ago by Uneasy-Z
2
Model can not converge
#52 opened 2 months ago by VJJJJJJ1
2
Training for wav to midi transcriper
#53 opened 2 months ago by mito0o852
1
Alignment language vocabulary and speech space
#22 opened 4 months ago by varfolomeeff
3
Questions for Creating a Better Model
#50 opened 2 months ago by ootsuka-repos
2
Streaming infer
#44 opened 3 months ago by wntg
3
Installable Package
#48 opened 2 months ago by poonehmousavi
2
Config/Model Checkpoint Pairing
#47 opened 3 months ago by MorenoLaQuatra
5
Question about Audio Preprocessing
#46 opened 3 months ago by xjf-303
1
how to train the model with Token/s about 23, that is hopsize=1024
#35 opened 4 months ago by Liujingxiu23
5
probability density for each index in the codebook
#41 opened 3 months ago by goforher
1
How many training steps to train wavtokenizer?
#43 opened 3 months ago by sphmel
1
When will the large unify model (speech, music, audio) be released?
#42 opened 3 months ago by MrPig
1
Files Missing？
#39 opened 4 months ago by goforher
6
CER Performance of Reconstructed Audio
#34 opened 4 months ago by howitry
6
why grad norm is so high？
#38 opened 4 months ago by necrophagists
1
Why so large commit loss weight
#36 opened 4 months ago by Ming-er
4
speech medium v2
#37 opened 4 months ago by theodorblackbird
1
Question about training
#32 opened 4 months ago by handsomelys
2
Using EMA on the generator markedly improves the validation loss
#33 opened 4 months ago by erogol
2
Maximum duration supported during inference?
#31 opened 4 months ago by LiuShixing
1
encounter shape inconsistent in training 16kHz
#19 opened 4 months ago by dyyoungg
3
How many hours of Chinese data are there?
#30 opened 4 months ago by LiuShixing
1
Comparison with Whisper
#27 opened 4 months ago by isruihu
1
What is the difference between the config for training WavTokenizer-small and WavTokenizer-large?
#25 opened 4 months ago by handsomelys
2
WavTokenizer-mdium is release on 2024.09.09
#23 opened 4 months ago by jishengpeng
4
Future 48kHz model
#21 opened 4 months ago by Ronsor
1
Purpose of os.environ['CUDA_LAUNCH_BLOCKING'] = '1' in train.py
#17 opened 4 months ago by seastar105
1
MRD vs MS-STFTD
#10 opened 4 months ago by Yagelmx
4
Mel or wav？
#18 opened 4 months ago by howitry
1
fail to install
#12 opened 4 months ago by JoyceMind
2
Weight of model
#16 opened 4 months ago by JoyceMind
1
Please consider about 16K model?
#15 opened 4 months ago by ywh-my
1
About infer in GPU
#13 opened 4 months ago by JohnFengNeumann
1
encode and decode for "16k sample"
#8 opened 4 months ago by sunnnnnnnny
1