aceport's Stars
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
neonbjb/tortoise-tts
A multi-voice TTS system trained with an emphasis on quality
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
huggingface/parler-tts
Inference and training library for high-quality TTS models.
MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
descriptinc/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
ictnlp/StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
OpenMOSS/AnyGPT
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
fighting41love/zhvoice
Chinese voice corpus. 中文语音语料,语音更加清晰自然,包含8个开源数据集,3200个说话人,900小时语音,1300万字。
yangdongchao/AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
hubertsiuzdak/snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
facebookresearch/speech-resynthesis
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
modelscope/FunCodec
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
KdaiP/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
bshall/hubert
HuBERT content encoders for: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
ZhangXInFD/soundstorm-speechtokenizer
Implementation of SoundStorm built upon SpeechTokenizer.
bshall/hifigan
An 16kHz implementation of HiFi-GAN for soft-vc.
3loi/NaturalVoices
MediaBox-AUIKits/AUIAICall
阿里云 · AUI Kits AI通话场景
Tencent-RTC/trtc-conversation-ai-demo
trtc conversation ai demo
RaheesAhmed/ADB-Input-Field-Extractor
This Python script uses the Android Debug Bridge (ADB) to extract information about input fields from the currently displayed screen of a connected Android device.