Pinned Repositories
asv-subtools
An Open Source Tools for Speaker Recognition
AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
cube-studio
云原生一站式机器学习平台,多租户,数据资产,notebook在线开发,拖拉拽任务流编排,多机多卡分布式训练,超参搜索,推理服务,多集群调度,多项目组资源组,边缘计算,大模型实时训练, ai应用商店
fairseq
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
kaldi_org
This is now the official location of the Kaldi project.
LeetcodeTop
汇总各大互联网公司容易考察的高频leetcode题🔥 推荐刷题网站:https://www.lintcode.com/?utm_source=tf-github-codetop
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
wetts
Production First and Production Ready End-to-End Text-to-Speech Toolkit
whisper-jax
whisper faster inference
zh-google-styleguide
Google 开源项目风格指南 (中文版)
donstang's Repositories
donstang/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
donstang/whisper-jax
whisper faster inference
donstang/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
donstang/audio-diffusion-pytorch
Audio generation using diffusion models, in PyTorch.
donstang/audiocraft_meta
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
donstang/CTranslate2
Fast inference engine for Transformer models
donstang/dcgm-exporter
NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
donstang/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
donstang/DiffSinger
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism (SVS & TTS); AAAI 2022; Official code
donstang/fish-speech
Brand new TTS solution
donstang/g2pW
Chinese Mandarin Grapheme-to-Phoneme Converter. 中文轉注音或拼音 (INTERSPEECH 2022)
donstang/gradio
Create UIs for your machine learning model in Python in 3 minutes
donstang/KAN-TTS
KAN-TTS is a speech-synthesis training framework, please try the demos we have posted at https://modelscope.cn/models?page=1&tasks=text-to-speech
donstang/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
donstang/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
donstang/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
donstang/mini-omni2
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
donstang/moshi
donstang/PaddleSpeech
Easy-to-use Speech Toolkit including SOTA/Streaming ASR with punctuation, influential TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
donstang/so-vits-svc
SoftVC VITS Singing Voice Conversion
donstang/speechmetrics_tts_eval
A wrapper around speech quality metrics MOSNet, BSSEval, STOI, PESQ, SRMR, SISDR
donstang/SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
donstang/SpokenNLP
meeting nlp processing
donstang/tango
Codes and Model of the paper "Text-to-Audio Generation using Instruction Tuned LLM and Latent Diffusion Model"
donstang/tortoise-tts
A multi-voice TTS system trained with an emphasis on quality
donstang/ultravox
A fast multimodal LLM for real-time voice
donstang/VALL-E-X
An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io
donstang/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
donstang/webdataset
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
donstang/Whisper-Finetune
微调Whisper语音识别模型和加速推理