lantel-wm

Graduate student @ NJU, major in meteorology, interested in LLM inference.

Nanjing University

lantel-wm's Stars

NVIDIA/nccl-tests
NCCL Tests
Language:Cuda842232
NVIDIA/nccl
Optimized primitives for collective multi-GPU communication
Language:C++3.2k794
DefTruth/CUDA-Learn-Notes
🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
Language:Cuda1.2k131
triton-lang/triton
Development repository for the Triton language and compiler
Language:C++12.9k1.6k
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Language:Python2.4k194
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++8.3k931
terrastruct/d2
D2 is a modern diagram scripting language that turns text to diagrams.
Language:Go16.6k417
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python5.4k390
gpu-mode/lectures
Material for gpu-mode lectures
Language:Jupyter Notebook2.6k259
microsoft/vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
Language:C19213
casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
Language:Python1.7k202
mit-han-lab/qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Language:Python40619
deepseek-ai/DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
Language:Python6.6k460
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda23.6k2.6k
gpu-mode/resource-stream
GPU programming related news and material links
1.1k69
KMnO4-zx/extract-dialogue
从小说中提取对话数据集
Language:Jupyter Notebook8212
hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
Language:Python31.8k3.9k
01-ai/Yi
A series of large language models trained from scratch by developers @01-ai
Language:Jupyter Notebook7.6k471
wukan1986/alpha_examples
alpha投研示例
Language:Python5417
HqWu-HITCS/Awesome-Chinese-LLM
整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。
15.2k1.4k
InternLM/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
Language:Python3.8k303
InternLM/InternLM
Official release of InternLM2.5 base and chat models. 1M context support
Language:Python6.3k441
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python27.7k4.1k
lantel-wm/llm-bench
Static benchmark for vLLM and serving benchmark for vLLM and PPL.
Language:Python1
openmlsys/openmlsys-zh
《Machine Learning Systems: Design and Implementation》- Chinese Version
Language:TeX4k431
Bohr1005/xcrypto
quant,trading system,crypto,async
Language:Rust29487
frankhart2018/sargparse
A sane argument parser for Rust
Language:Rust31
ninehills/llm-inference-benchmark
LLM Inference benchmark
Language:Python33726
PKUFlyingPig/CMU10-714
Learning material for CMU10-714: Deep Learning System
Language:Jupyter Notebook21135
THU-MIG/yolov10
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]
Language:Python9.6k914

lantel-wm

lantel-wm's Stars

NVIDIA/nccl-tests

NVIDIA/nccl

DefTruth/CUDA-Learn-Notes

triton-lang/triton

ModelTC/lightllm

NVIDIA/TensorRT-LLM

terrastruct/d2

sgl-project/sglang

gpu-mode/lectures

microsoft/vattention

casper-hansen/AutoAWQ

mit-han-lab/qserve

deepseek-ai/DeepSeek-Coder

karpathy/llm.c

gpu-mode/resource-stream

KMnO4-zx/extract-dialogue

hiyouga/LLaMA-Factory

01-ai/Yi

wukan1986/alpha_examples

HqWu-HITCS/Awesome-Chinese-LLM

InternLM/xtuner

InternLM/InternLM

vllm-project/vllm

lantel-wm/llm-bench

openmlsys/openmlsys-zh

Bohr1005/xcrypto

frankhart2018/sargparse

ninehills/llm-inference-benchmark

PKUFlyingPig/CMU10-714

THU-MIG/yolov10