lantel-wm
Graduate student @ NJU, major in meteorology, interested in LLM inference.
Nanjing University
lantel-wm's Stars
NVIDIA/nccl-tests
NCCL Tests
NVIDIA/nccl
Optimized primitives for collective multi-GPU communication
DefTruth/CUDA-Learn-Notes
🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
triton-lang/triton
Development repository for the Triton language and compiler
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
terrastruct/d2
D2 is a modern diagram scripting language that turns text to diagrams.
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
gpu-mode/lectures
Material for gpu-mode lectures
microsoft/vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
casper-hansen/AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
mit-han-lab/qserve
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
deepseek-ai/DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
karpathy/llm.c
LLM training in simple, raw C/CUDA
gpu-mode/resource-stream
GPU programming related news and material links
KMnO4-zx/extract-dialogue
从小说中提取对话数据集
hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
01-ai/Yi
A series of large language models trained from scratch by developers @01-ai
wukan1986/alpha_examples
alpha投研示例
HqWu-HITCS/Awesome-Chinese-LLM
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
InternLM/xtuner
An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)
InternLM/InternLM
Official release of InternLM2.5 base and chat models. 1M context support
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
lantel-wm/llm-bench
Static benchmark for vLLM and serving benchmark for vLLM and PPL.
openmlsys/openmlsys-zh
《Machine Learning Systems: Design and Implementation》- Chinese Version
Bohr1005/xcrypto
quant,trading system,crypto,async
frankhart2018/sargparse
A sane argument parser for Rust
ninehills/llm-inference-benchmark
LLM Inference benchmark
PKUFlyingPig/CMU10-714
Learning material for CMU10-714: Deep Learning System
THU-MIG/yolov10
YOLOv10: Real-Time End-to-End Object Detection [NeurIPS 2024]