zc1616

zc1616's Stars

LC044/WeChatMsg
提取微信聊天记录，将其导出成HTML、Word、Excel文档永久保存，对聊天记录进行分析生成年度聊天报告，用聊天数据训练专属于个人的AI聊天助手
Language:Python38.1k 185 4553.9k
microsoft/OmniParser
A simple screen parsing tool towards pure vision based GUI agent
Language:Jupyter Notebook20.7k 168 1801.7k
fishaudio/fish-speech
SOTA Open Source TTS
Language:Python20.1k 117 5321.6k
QwenLM/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Language:Python17.5k 132 1.1k1.4k
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python12.1k 90 8781.2k
Tencent/HunyuanVideo
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Language:Python9.3k 110 214774
kyutai-labs/moshi
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Language:Python7.8k 87 112639
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Language:Python3.2k 78 127278
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language:Python2.9k 30 58196
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
Language:Python2.8k 30 143226
PlayVoice/whisper-vits-svc
Core Engine of Singing Voice Conversion & Singing Voice Clone
Language:Python2.7k 30 169921
luban-agi/Awesome-Domain-LLM
收集和梳理垂直领域的开源模型、数据集及评测基准。
2.4k 37 3199
feizc/FluxMusic
Text-to-Music Generation with Rectified Flow Transformers
Language:Python1.7k 19 24134
Tencent/Tencent-Hunyuan-Large
Language:Python1.5k 26 1793
0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
Language:Python1.4k 46 5690
BytedanceSpeech/seed-tts-eval
Language:Python1.2k 13 16112
jishengpeng/WavTokenizer
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
Language:Python1.1k 23 6577
gemelo-ai/vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Language:Python894 31 57107
ContextualAI/HALOs
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
Language:Python816 7 2449
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Language:Python534 16 2349
fgnt/nara_wpe
Different implementations of "Weighted Prediction Error" for speech dereverberation
Language:Python508 19 37166
RoyJames/room-impulse-responses
A list of publicly available room impulse response datasets and scripts to download them.
Language:Shell444 6 039
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
Language:Python384 43 123
X-LANCE/VoiceFlow-TTS
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
Language:Python336 15 1821
etzinis/sudo_rm_rf
Code for SuDoRm-Rf networks for efficient audio source separation. SuDoRm-Rf stands for SUccessive DOwnsampling and Resampling of Multi-Resolution Features which enables a more efficient way of separating sources from mixtures.
Language:Jupyter Notebook313 7 2434
JusperLee/TDANet
An efficient speech separation method
Language:Python271 23 3633
tencent-ailab/FRA-RIR
Language:Python189 8 830
wenet-e2e/wesep
Target Speaker Extraction Toolkit
Language:Python149 6 1616
cszheng-ioa/Sixty-years-of-frequency-domain-monaural-speech-enhancement
Language:Python139 4 227
tencent-ailab/UltraDualPathCompression
A Pytorch-based implementation of the compression and decompression module in "Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression".
Language:Jupyter Notebook46 3 15