SqueezeAILab

SqueezeAI is part of Berkeley AI Research Lab at UC Berkeley focused on AI Systems research.

Pinned Repositories

KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Language:Python331 12 1730
LLM2LLM
[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Language:Python173 5 612
LLMCompiler
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
Language:Python1.6k 24 7119
open_source_projects
Open Source Projects from Pallas Lab
20 3 02
SqueezedAttention
SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference
Language:Python39 3 23
SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
Language:Python675 18 2843
TinyAgent
[EMNLP 2024 Demo] TinyAgent: Function Calling at the Edge!
Language:Python362 10 757
Tool2Vec
Efficient and Scalable Estimation of Tool Representations in Vector Space
Language:Python18 5 10

SqueezeAILab/LLMCompiler
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
Language:Python1.6k 24 7119
SqueezeAILab/SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
Language:Python675 18 2843
SqueezeAILab/TinyAgent
[EMNLP 2024 Demo] TinyAgent: Function Calling at the Edge!
Language:Python362 10 757
SqueezeAILab/KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Language:Python331 12 1730
SqueezeAILab/LLM2LLM
[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Language:Python173 5 612
SqueezeAILab/SqueezedAttention
SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference
Language:Python39 3 23
SqueezeAILab/open_source_projects
Open Source Projects from Pallas Lab
20 3 02
SqueezeAILab/Tool2Vec
Efficient and Scalable Estimation of Tool Representations in Vector Space
Language:Python18 5 10