Kakaluote1234's Stars
DLR-RM/stable-baselines3
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
hijkzzz/Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
nocodb/nocodb
🔥 🔥 🔥 Open Source Airtable Alternative
geekxh/hello-algorithm
🌍 针对小白的算法训练 | 包括四部分:①.大厂面经 ②.力扣图解 ③.千本开源电子书 ④.百张技术思维导图(项目花了上百小时,希望可以点 star 支持,🌹感谢~)推荐免费ChatGPT使用网站
hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
pgvector/pgvector
Open-source vector similarity search for Postgres
wtlow003/modal-llm-serving
Examples of serving LLM on Modal.
owenliang/qwen-vllm
通义千问VLLM推理部署DEMO
Tlntin/Qwen-TensorRT-LLM
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
triton-inference-server/tensorrtllm_backend
The Triton TensorRT-LLM Backend
alibaba/Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Azure-Samples/graphrag-accelerator
One-click deploy of a Knowledge Graph powered RAG (GraphRAG) in Azure
chatchat-space/Langchain-Chatchat
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and Llama) RAG and Agent app with langchain
langchain-ai/langchain
🦜🔗 Build context-aware reasoning applications
Sinaptik-AI/pandas-ai
Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.
casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Tanuki/tanuki.py
Prompt engineering for developers
ggerganov/llama.cpp
LLM inference in C/C++
rakyll/hey
HTTP load generator, ApacheBench (ab) replacement
DRSY/EMO
[ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
flexflow/FlexFlow
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
karpathy/minGPT
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
ray-project/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
run-llama/llama_index
LlamaIndex is a data framework for your LLM applications
Dao-AILab/flash-attention
Fast and memory-efficient exact attention