XxSuper's Stars
meta-llama/llama-models
Utilities intended for use with Llama models.
QwenLM/Qwen2.5
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
xorbitsai/inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
OpenBMB/MiniCPM
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Zeyi-Lin/HivisionIDPhotos
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
netease-youdao/QAnything
Question and Answer based on Anything.
kleinlee/DH_live
每个人都能用的数字人
QuivrHQ/quivr
Open-source RAG Framework for building GenAI Second Brains 🧠 Build productivity assistant (RAG) ⚡️🤖 Chat with your docs (PDF, CSV, ...) & apps using Langchain, GPT 3.5 / 4 turbo, Private, Anthropic, VertexAI, Ollama, LLMs, Groq that you can share with users ! Efficient retrieval augmented generation framework
langgenius/dify
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Kedreamix/Linly-Dubbing
智能视频多语言AI配音/翻译工具 - Linly-Dubbing — “AI赋能,语言无界”
infiniflow/ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Zheng-Chong/CatVTON
CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).
aiola-lab/whisper-medusa
Whisper with Medusa heads
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
CerebriumAI/examples
Ikaros-521/RealtimeSTT_LLM_TTS
实时STT,连接OpenAI接口/智谱AI(流式LLM)和GPT-SOVITS/Edge-TTS,通过网页的方式,进行跨网络的服务调用,实现实时对话的效果
hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
mem0ai/mem0
The Memory layer for your AI apps
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
BadToBest/EchoMimic
Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
netease-youdao/EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
6drf21e/ChatTTS_colab
🚀 一键部署(含离线整合包)!基于 ChatTTS ,支持流式输出、音色抽卡、长音频生成和分角色朗读。简单易用,无需复杂安装。
fishaudio/fish-speech
Brand new TTS solution
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system