jishengpeng/WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
PythonMIT
Pinned issues
Issues
- 1
Training time
#63 opened by Ming-er - 6
Question about the validation set
#62 opened by alberthli - 10
The loss value when the model converges
#20 opened by yangyyt - 3
重建效果是否符合预期
#61 opened by zzchust - 4
Performance in LLM-based-TTS
#40 opened by Liujingxiu23 - 1
# audio_tokens错误问题
#60 opened by EdisonZhu33 - 1
- 5
We update WavTokenizer paper in Arxiv and release WavTokenizer-Large checkpoint in Huggingface on 2024.10.22
#45 opened by jishengpeng - 3
- 4
Duplicate discriminator in dac?
#58 opened by npuichigo - 2
question about streaming infer
#56 opened by VJJJJJJ1 - 7
- 1
some questions about model
#49 opened by VJJJJJJ1 - 3
Traning on wenetspeech couldn‘t converge
#28 opened by dyyoungg - 4
- 2
Semantic Representation
#55 opened by Uneasy-Z - 2
Model can not converge
#52 opened by VJJJJJJ1 - 1
Training for wav to midi transcriper
#53 opened by mito0o852 - 3
- 2
Questions for Creating a Better Model
#50 opened by ootsuka-repos - 3
Streaming infer
#44 opened by wntg - 2
Installable Package
#48 opened by poonehmousavi - 5
Config/Model Checkpoint Pairing
#47 opened by MorenoLaQuatra - 1
Question about Audio Preprocessing
#46 opened by xjf-303 - 5
- 1
- 1
How many training steps to train wavtokenizer?
#43 opened by sphmel - 1
- 6
Files Missing?
#39 opened by goforher - 6
CER Performance of Reconstructed Audio
#34 opened by howitry - 1
why grad norm is so high?
#38 opened by necrophagists - 4
Why so large commit loss weight
#36 opened by Ming-er - 1
speech medium v2
#37 opened by theodorblackbird - 2
Question about training
#32 opened by handsomelys - 2
- 1
Maximum duration supported during inference?
#31 opened by LiuShixing - 3
encounter shape inconsistent in training 16kHz
#19 opened by dyyoungg - 1
How many hours of Chinese data are there?
#30 opened by LiuShixing - 1
Comparison with Whisper
#27 opened by isruihu - 2
What is the difference between the config for training WavTokenizer-small and WavTokenizer-large?
#25 opened by handsomelys - 4
WavTokenizer-mdium is release on 2024.09.09
#23 opened by jishengpeng - 1
Future 48kHz model
#21 opened by Ronsor - 1
- 4
MRD vs MS-STFTD
#10 opened by Yagelmx - 1
Mel or wav?
#18 opened by howitry - 2
fail to install
#12 opened by JoyceMind - 1
Weight of model
#16 opened by JoyceMind - 1
Please consider about 16K model?
#15 opened by ywh-my - 1
About infer in GPU
#13 opened by JohnFengNeumann - 1
encode and decode for "16k sample"
#8 opened by sunnnnnnnny