jacklee0575

jacklee0575's Stars

Significant-Gravitas/AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Language:Python169k 1.5k 2.9k44.5k
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
Language:Python22.4k 187 5122.2k
BradyFU/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
12.8k 274 121819
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Language:Python12.7k 106 589893
cpacker/MemGPT
Letta (fka MemGPT) is a framework for creating stateful LLM services.
Language:Python12.1k 116 7821.3k
LLM-Red-Team/kimi-free-api
🚀 KIMI AI 长文本大模型逆向API白嫖测试【特长：长文本解读整理】，支持高速流式输出、智能体对话、联网搜索、长文档解读、图像OCR、多轮对话，零配置部署，多路token支持，自动清理会话痕迹。
Language:TypeScript3.8k 32 123631
kserve/kserve
Standardized Serverless ML Inference Platform on Kubernetes
Language:Python3.7k 65 1.9k1.1k
showlab/Awesome-Video-Diffusion
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
3.5k 138 20203
dvmazur/mixtral-offloading
Run Mixtral-8x7B models in Colab or consumer desktops
Language:Python2.3k 29 28226
predibase/lorax
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Language:Python2.2k 33 245145
basicmi/AI-Chip
A list of ICs and IPs for AI, Machine Learning and Deep Learning.
Language:PHP1.6k 267 25274
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.5k 19 134145
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
1.1k 12 425
VITA-MLLM/VITA
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
Language:Python969 39 5559
efeslab/Nanoflow
A throughput-oriented high-performance serving framework for LLMs
Language:Cuda643 7 2226
Vchitect/VBench
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
Language:Python597 11 7629
microsoft/T-MAC
Low-bit LLM inference on CPU with lookup table
Language:C++595 11 5245
UbiquitousLearning/mllm
Fast Multimodal LLM on Mobile Devices
Language:C++548 16 3861
mit-han-lab/duo-attention
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Language:Python387 8 1018
LLMServe/DistServe
Disaggregated serving system for Large Language Models (LLMs).
Language:Jupyter Notebook367 5 4243
microsoft/sarathi-serve
A low-latency & high-throughput serving engine for LLMs
Language:Python252 7 1831
AlibabaPAI/llumnix
Efficient and easy multi-instance LLM serving
Language:Python225 10 712
FasterDecoding/SnapKV
Language:Python193 5 197
PrincetonUniversity/LLMCompass
Language:Python87 1 123
LLMServe/SwiftTransformer
High performance Transformer implementation in C++.
Language:C++83 2 610
Mutinifni/splitwise-sim
LLM serving cluster simulator
Language:Jupyter Notebook82 2 38
siyan-zhao/prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"
Language:Jupyter Notebook56 2 12
Intsights/PySubstringSearch
Python library for fast substring/pattern search written in C++ leveraging Suffix Array Algorithm
Language:C41 3 55
dvlab-research/Q-LLM
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
Language:Python39 1 31
lzhxmu/VTW
Language:Python21 1 4