donstang

wuhan ai researchwuhan,Hubei,China

Pinned Repositories

asv-subtools
An Open Source Tools for Speaker Recognition
Language:Python1 0 00
AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Language:Python1 0 00
cube-studio
云原生一站式机器学习平台，多租户，数据资产，notebook在线开发，拖拉拽任务流编排，多机多卡分布式训练，超参搜索，推理服务，多集群调度，多项目组资源组，边缘计算，大模型实时训练, ai应用商店
Language:Jupyter Notebook0 0 00
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Language:Python1 0 00
kaldi_org
This is now the official location of the Kaldi project.
Language:Shell1 1 00
LeetcodeTop
汇总各大互联网公司容易考察的高频leetcode题🔥 推荐刷题网站：https://www.lintcode.com/?utm_source=tf-github-codetop
1 0 00
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Language:C++1 0 00
wetts
Production First and Production Ready End-to-End Text-to-Speech Toolkit
Language:Python1 0 00
whisper-jax
whisper faster inference
Language:Jupyter Notebook0 0 00
zh-google-styleguide
Google 开源项目风格指南 (中文版)
Language:Makefile1 0 00

donstang's Repositories

donstang/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Language:Python1 0 00
donstang/AIR-Bench
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
donstang/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python0 0
donstang/audio-diffusion-pytorch
Audio generation using diffusion models, in PyTorch.
Language:Python0 0
donstang/audiocraft_meta
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Language:Python0 0
donstang/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python0 0
donstang/dynamic-superb
The official repository of Dynamic-SUPERB.
donstang/fish-speech
Brand new TTS solution
donstang/g2pW
Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)
Language:Python0 0
donstang/gradio
Create UIs for your machine learning model in Python in 3 minutes
Language:HTML0 0
donstang/KAN-TTS
KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech
Language:Python0 0
donstang/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
donstang/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
donstang/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
donstang/mini-omni2
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
donstang/moshi
donstang/nanoGPT
The simplest, fastest repository for training/finetuning medium-sized GPTs.
donstang/PaddleSpeech
Easy-to-use Speech Toolkit including SOTA/Streaming ASR with punctuation, influential TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Language:Python0 0
donstang/so-vits-svc
SoftVC VITS Singing Voice Conversion
Language:Python0 0
donstang/speechmetrics_tts_eval
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
Language:Python0 0
donstang/SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Language:Python0 0
donstang/SpokenNLP
meeting nlp processing
Language:Python0 0
donstang/tango
Codes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model"
Language:Python0 0
donstang/tortoise-tts
A multi-voice TTS system trained with an emphasis on quality
donstang/ultravox
A fast multimodal LLM for real-time voice
donstang/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
donstang/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
donstang/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
donstang/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
Language:Python0 0
donstang/Whisper-Finetune
微调Whisper语音识别模型和加速推理
Language:Python0 0