aceport

aceport's Stars

RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Language:Python37.4k 218 1.4k4.2k
coqui-ai/TTS
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Language:Python36.4k 297 1.1k4.5k
neonbjb/tortoise-tts
A multi-voice TTS system trained with an emphasis on quality
Language:Jupyter Notebook13.4k 174 5221.9k
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Language:Python12.9k 106 612905
speechbrain/speechbrain
A PyTorch-based Speech Toolkit
Language:Python9.1k 134 1.1k1.4k
OpenGVLab/InternVL
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Language:Python6.6k 56 690509
huggingface/parler-tts
Inference and training library for high-quality TTS models.
Language:Python4.8k 54 124494
MahmoudAshraf97/whisper-diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
Language:Jupyter Notebook3.9k 48 212350
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Language:Python3.6k 44 91389
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Language:Python3.2k 101 123289
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language:Python2.7k 29 52185
0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
Language:Python1.3k 46 5587
descriptinc/descript-audio-codec
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Language:Python1.2k 28 80117
ictnlp/StreamSpeech
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Language:Python985 12 1575
OpenMOSS/AnyGPT
Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"
Language:Python801 20 4164
fighting41love/zhvoice
Chinese voice corpus. 中文语音语料，语音更加清晰自然，包含8个开源数据集，3200个说话人，900小时语音，1300万字。
607 9 0115
yangdongchao/AcademiCodec
AcademiCodec: An Open Source Audio Codec Model for Academic Research
Language:Python599 31 4080
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Language:Python507 16 2245
hubertsiuzdak/snac
Multi-Scale Neural Audio Codec (SNAC) compresses audio into discrete codes at a low bitrate
Language:Python460 7 2426
facebookresearch/speech-resynthesis
An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
Language:Python394 19 2056
modelscope/FunCodec
FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
Language:Python376 15 5231
KdaiP/StableTTS
Next-generation TTS model using flow-matching and DiT, inspired by Stable Diffusion 3
Language:Python371 23 2142
bshall/hubert
HuBERT content encoders for: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
Language:Python337 4 1753
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
Language:Python322 43 022
ZhangXInFD/soundstorm-speechtokenizer
Implementation of SoundStorm built upon SpeechTokenizer.
Language:Python106 6 1013
bshall/hifigan
An 16kHz implementation of HiFi-GAN for soft-vc.
Language:Python95 5 625
3loi/NaturalVoices
Language:Jupyter Notebook47 4 03
MediaBox-AUIKits/AUIAICall
阿里云 · AUI Kits AI通话场景
Language:Java111
Tencent-RTC/trtc-conversation-ai-demo
trtc conversation ai demo
Language:HTML2
RaheesAhmed/ADB-Input-Field-Extractor
This Python script uses the Android Debug Bridge (ADB) to extract information about input fields from the currently displayed screen of a connected Android device.
Language:Python1 1 0