jacklee0575's Stars
Significant-Gravitas/AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
cpacker/MemGPT
Letta (fka MemGPT) is a framework for creating stateful LLM services.
LLM-Red-Team/kimi-free-api
🚀 KIMI AI 长文本大模型逆向API白嫖测试【特长:长文本解读整理】,支持高速流式输出、智能体对话、联网搜索、长文档解读、图像OCR、多轮对话,零配置部署,多路token支持,自动清理会话痕迹。
kserve/kserve
Standardized Serverless ML Inference Platform on Kubernetes
showlab/Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
dvmazur/mixtral-offloading
Run Mixtral-8x7B models in Colab or consumer desktops
predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
basicmi/AI-Chip
A list of ICs and IPs for AI, Machine Learning and Deep Learning.
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
efeslab/Nanoflow
A throughput-oriented high-performance serving framework for LLMs
Vchitect/VBench
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
microsoft/T-MAC
Low-bit LLM inference on CPU with lookup table
UbiquitousLearning/mllm
Fast Multimodal LLM on Mobile Devices
mit-han-lab/duo-attention
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
LLMServe/DistServe
Disaggregated serving system for Large Language Models (LLMs).
microsoft/sarathi-serve
A low-latency & high-throughput serving engine for LLMs
AlibabaPAI/llumnix
Efficient and easy multi-instance LLM serving
FasterDecoding/SnapKV
PrincetonUniversity/LLMCompass
LLMServe/SwiftTransformer
High performance Transformer implementation in C++.
Mutinifni/splitwise-sim
LLM serving cluster simulator
siyan-zhao/prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"
Intsights/PySubstringSearch
Python library for fast substring/pattern search written in C++ leveraging Suffix Array Algorithm
dvlab-research/Q-LLM
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
lzhxmu/VTW