Oliver-ss

Duke UniversityShanghai

Oliver-ss's Stars

mit-han-lab/qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Language:Python40619
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
2.6k173
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python5.6k507
Codium-ai/pr-agent
🚀CodiumAI PR-Agent: An AI-Powered 🤖 Tool for Automated Pull Request Analysis, Feedback, Suggestions and More! 💻🔍
Language:Python5.8k534
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++8.3k931
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python13.6k1.2k
NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Language:C6.2k1.8k
AutoGPTQ/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python4.4k467
DIYgod/RSSHub
🧡 Everything is RSSible
Language:TypeScript32.6k7.3k
feeddd/feeds
免费的公众号 RSS，支持扩展任意 APP
Language:JavaScript2.1k89
flexflow/FlexFlow
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
Language:C++1.7k223
ray-project/ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Language:Python33.3k5.6k
vzhd1701/evernote-backup
Backup & export all Evernote notes and notebooks
Language:Python95570
krahets/hello-algo
《Hello 算法》：动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新，English version ongoing
Language:Java96.1k12.2k
anyscale/llm-continuous-batching-benchmarks
Language:Python10821
ray-project/ray-llm
RayLLM - LLMs on Ray
Language:Python1.2k91
bytedance/effective_transformer
Running BERT without Padding
Language:C++45752
mit-han-lab/llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Language:Python2.4k184
tlc-pack/cutlass_fpA_intB_gemm
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
Language:C++8421
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python27.7k4.1k
guidance-ai/guidance
A guidance language for controlling large language models.
Language:Jupyter Notebook18.8k1k
hkust-nlp/ceval
Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]
Language:Python1.6k76
LazyVim/LazyVim
Neovim config for the lazy
Language:Lua16.8k1.2k
run-llama/llama_index
LlamaIndex is a data framework for your LLM applications
Language:Python35.8k5.1k
godweiyang/NN-CUDA-Example
Several simple examples for popular neural network toolkits calling custom CUDA operators.
Language:Python1.3k184
tpoisonooo/llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
Language:Python34531
ymcui/Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
Language:Python18.2k1.9k
oobabooga/text-generation-webui
A Gradio web UI for Large Language Models.
Language:Python39.8k5.2k
triton-lang/triton
Development repository for the Triton language and compiler
Language:C++12.9k1.6k
huggingface/text-generation-inference
Large Language Model Text Generation Inference
Language:Python8.8k1k

Oliver-ss

Oliver-ss's Stars

mit-han-lab/qserve

DefTruth/Awesome-LLM-Inference

pytorch-labs/gpt-fast

Codium-ai/pr-agent

NVIDIA/TensorRT-LLM

Dao-AILab/flash-attention

NVIDIA/cuda-samples

AutoGPTQ/AutoGPTQ

DIYgod/RSSHub

feeddd/feeds

flexflow/FlexFlow

ray-project/ray

vzhd1701/evernote-backup

krahets/hello-algo

anyscale/llm-continuous-batching-benchmarks

ray-project/ray-llm

bytedance/effective_transformer

mit-han-lab/llm-awq

tlc-pack/cutlass_fpA_intB_gemm

vllm-project/vllm

guidance-ai/guidance

hkust-nlp/ceval

LazyVim/LazyVim

run-llama/llama_index

godweiyang/NN-CUDA-Example

tpoisonooo/llama.onnx

ymcui/Chinese-LLaMA-Alpaca

oobabooga/text-generation-webui

triton-lang/triton

huggingface/text-generation-inference