zc1616's Stars
LC044/WeChatMsg
提取微信聊天记录,将其导出成HTML、Word、Excel文档永久保存,对聊天记录进行分析生成年度聊天报告,用聊天数据训练专属于个人的AI聊天助手
microsoft/OmniParser
A simple screen parsing tool towards pure vision based GUI agent
fishaudio/fish-speech
SOTA Open Source TTS
QwenLM/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Tencent/HunyuanVideo
HunyuanVideo: A Systematic Framework For Large Video Generation Model
kyutai-labs/moshi
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
PlayVoice/whisper-vits-svc
Core Engine of Singing Voice Conversion & Singing Voice Clone
luban-agi/Awesome-Domain-LLM
收集和梳理垂直领域的开源模型、数据集及评测基准。
feizc/FluxMusic
Text-to-Music Generation with Rectified Flow Transformers
Tencent/Tencent-Hunyuan-Large
0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
BytedanceSpeech/seed-tts-eval
jishengpeng/WavTokenizer
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
gemelo-ai/vocos
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
ContextualAI/HALOs
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
fgnt/nara_wpe
Different implementations of "Weighted Prediction Error" for speech dereverberation
RoyJames/room-impulse-responses
A list of publicly available room impulse response datasets and scripts to download them.
liutaocode/TTS-arxiv-daily
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
X-LANCE/VoiceFlow-TTS
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
etzinis/sudo_rm_rf
Code for SuDoRm-Rf networks for efficient audio source separation. SuDoRm-Rf stands for SUccessive DOwnsampling and Resampling of Multi-Resolution Features which enables a more efficient way of separating sources from mixtures.
JusperLee/TDANet
An efficient speech separation method
tencent-ailab/FRA-RIR
wenet-e2e/wesep
Target Speaker Extraction Toolkit
cszheng-ioa/Sixty-years-of-frequency-domain-monaural-speech-enhancement
tencent-ailab/UltraDualPathCompression
A Pytorch-based implementation of the compression and decompression module in "Ultra Dual-Path Compression For Joint Echo Cancellation And Noise Suppression".