MonadKai's Stars
LoongServe/LoongServe
AlibabaPAI/FLASHNN
gpu-mode/triton-index
Cataloging released Triton kernels.
timudk/flux_triton
Writing FLUX in Triton
microsoft/BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
October2001/Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
open-mmlab/mmdeploy
OpenMMLab Model Deployment Framework
amusi/AI-Job-Notes
AI算法岗求职攻略(涵盖准备攻略、刷题指南、内推和AI公司清单等资料)
sgl-project/sgl-learning-materials
Materials for learning SGLang
zkkli/I-ViT
[ICCV 2023] I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference
Aleph-Alpha/scaling
Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for training large language models.
wdndev/llm_interview_note
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
kyutai-labs/moshi
66RING/CritiPrefill
Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".
sustcsonglin/flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
bklieger-groq/g1
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
BobMcDear/attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
NetEase-Media/grps_vllm
【grps接入vllm】通过vllm LLMEngine Api实现LLM服务。
NetEase-Media/grps_trtllm
【grps接入trtllm】通过GPRS+TensorRT-LLM+Tokenizers.cpp实现纯C++版高性能OpenAI LLM服务,支持chat和function call模式,支持ai agent,支持分布式多卡推理,支持多模态,支持gradio聊天界面。
NetEase-Media/grps
【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接口方式提供服务。
NVIDIA/dcgm-exporter
NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
NVIDIA/nvidia-container-toolkit
Build and run containers leveraging NVIDIA GPUs
qhjqhj00/MemoRAG
Empowering RAG with a memory-based data interface for all-purpose applications!
deepjavalibrary/djl-serving
A universal scalable machine learning model deployment solution
ColfaxResearch/cutlass-kernels
BestAnHongjun/LMDeploy-Jetson
Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function independently without continuous internet access.
RSSNext/Follow
🧡 Follow your favorites in one inbox
google-deepmind/optax
Optax is a gradient processing and optimization library for JAX.
google/orbax
Orbax provides common checkpointing and persistence utilities for JAX users
ratatui/ratatui
A Rust crate for cooking up terminal user interfaces (TUIs) 👨🍳🐀 https://ratatui.rs