XxSuper

XxSuper's Stars

meta-llama/llama-models
Utilities intended for use with Llama models.
Language:Python4.3k763
QwenLM/Qwen2.5
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
Language:Shell8.6k539
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
Language:Python2.1k123
xorbitsai/inference
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
Language:Python5k395
OpenBMB/MiniCPM
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Language:Jupyter Notebook7k439
gpt-omni/mini-omni
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Language:Python2.7k252
Zeyi-Lin/HivisionIDPhotos
⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。
Language:Python10.5k1k
netease-youdao/QAnything
Question and Answer based on Anything.
Language:Python11.5k1.1k
kleinlee/DH_live
每个人都能用的数字人
Language:Python46298
QuivrHQ/quivr
Open-source RAG Framework for building GenAI Second Brains 🧠 Build productivity assistant (RAG) ⚡️🤖 Chat with your docs (PDF, CSV, ...) & apps using Langchain, GPT 3.5 / 4 turbo, Private, Anthropic, VertexAI, Ollama, LLMs, Groq that you can share with users ! Efficient retrieval augmented generation framework
Language:Python36.2k3.5k
langgenius/dify
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
Language:TypeScript47k6.7k
huggingface/speech-to-speech
Speech To Speech: an effort for an open-sourced and modular GPT4-o
Language:Python3.2k331
Kedreamix/Linly-Dubbing
智能视频多语言AI配音/翻译工具 - Linly-Dubbing — “AI赋能，语言无界”
Language:Jupyter Notebook1.6k128
infiniflow/ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Language:Python18.4k1.9k
Zheng-Chong/CatVTON
CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).
Language:Python78990
aiola-lab/whisper-medusa
Whisper with Medusa heads
Language:Python78448
yl4579/StyleTTS2
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Language:Python4.8k391
CerebriumAI/examples
Language:Python42955
Ikaros-521/RealtimeSTT_LLM_TTS
实时STT，连接OpenAI接口/智谱AI（流式LLM）和GPT-SOVITS/Edge-TTS，通过网页的方式，进行跨网络的服务调用，实现实时对话的效果
Language:Python21233
hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
Language:Python31.8k3.9k
mem0ai/mem0
The Memory layer for your AI apps
Language:Python22.1k2k
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python27.7k4.1k
BadToBest/EchoMimic
Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
Language:Python2.5k301
modelscope/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Language:Python6.2k657
netease-youdao/EmotiVoice
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Language:Python7.3k623
6drf21e/ChatTTS_colab
🚀 一键部署（含离线整合包）！基于 ChatTTS ，支持流式输出、音色抽卡、长音频生成和分角色朗读。简单易用，无需复杂安装。
Language:Python2k245
fishaudio/fish-speech
Brand new TTS solution
Language:Python12.8k958
FunAudioLLM/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python5.2k535
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
Language:Python2.8k265
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
Language:Python17.7k1.7k