lunar333's Stars
QwenLM/Qwen2-Audio
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
2noise/ChatTTS
A generative speech model for daily dialogue.
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (Qwen2.5, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
LinkSoul-AI/LLaSM
第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。
echonoshy/cgft-llm
Practice to LLM.
hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
QwenLM/Qwen2.5
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
ddlBoJack/emotion2vec
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
UnicomAI/Unichat-llama3-Chinese
traceless/alist-encrypt
这个项目主要是对 alist 的服务进行代理,提供 webdav 的加解密功能。支持 alist 网页在线播放加密的视频,查看加密的图片等功能,同时在 webdav 下的操作透明,自动实现文件资源的加解密。
Kedreamix/Linly-Talker
Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
jik876/hifi-gan
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
f/awesome-chatgpt-prompts
This repo includes ChatGPT prompt curation to use ChatGPT better.
JushBJJ/Mr.-Ranedeer-AI-Tutor
A GPT-4 AI Tutor Prompt for customizable personalized learning experiences.
neulab/BARTScore
BARTScore: Evaluating Generated Text as Text Generation
fishaudio/Bert-VITS2
vits2 backbone with multilingual-bert
Nekomoekissaten-SUB/Nekomoekissaten-Subs
Subtitle source files from Nekomoe Kissaten. Should there be any issues, please create them in this main repository first.
RVC-Project/Retrieval-based-Voice-Conversion-WebUI
Easily train a good VC model with voice data <= 10 mins!
PierreColombo/nlg_eval_via_simi_measures
NLG evaluation via Statistical Measures of Similarity: BaryScore, DepthScore, InfoLM
THUDM/ChatGLM3
ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
0nutation/SpeechGPT
SpeechGPT Series: Speech Large Language Models
lunar333/vits-japanese-finetune
AlexandaJerry/vits-mandarin-biaobei
application of vits on mandarin tts
snakers4/silero-vad
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
bmaltais/kohya_ss
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Mastering-Python-GT/Transcription-diarization-whisper-pyannote
Transcription and diarization (speaker identification)