MonadKai

Beijing, China

MonadKai's Stars

LoongServe/LoongServe
Language:Jupyter Notebook616
AlibabaPAI/FLASHNN
Language:Python798
gpu-mode/triton-index
Cataloging released Triton kernels.
1387
timudk/flux_triton
Writing FLUX in Triton
Language:Python307
microsoft/BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Language:Python42634
October2001/Awesome-KV-Cache-Compression
📰 Must-read papers on KV Cache Compression (constantly updating 🤗).
1553
open-mmlab/mmdeploy
OpenMMLab Model Deployment Framework
Language:Python2.8k636
amusi/AI-Job-Notes
AI算法岗求职攻略（涵盖准备攻略、刷题指南、内推和AI公司清单等资料）
5.3k640
sgl-project/sgl-learning-materials
Materials for learning SGLang
1157
zkkli/I-ViT
[ICCV 2023] I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference
Language:Python15515
Aleph-Alpha/scaling
Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for training large language models.
Language:Python512
wdndev/llm_interview_note
主要记录大语言大模型（LLMs）算法（应用）工程师相关的知识及面试题
Language:HTML3.9k446
kyutai-labs/moshi
Language:Python6.9k538
66RING/CritiPrefill
Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".
Language:Python121
sustcsonglin/flash-linear-attention
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
Language:Python1.4k70
bklieger-groq/g1
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
Language:Python3.9k354
BobMcDear/attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Language:Python48622
NetEase-Media/grps_vllm
【grps接入vllm】通过vllm LLMEngine Api实现LLM服务。
Language:Python6
NetEase-Media/grps_trtllm
【grps接入trtllm】通过GPRS+TensorRT-LLM+Tokenizers.cpp实现纯C++版高性能OpenAI LLM服务，支持chat和function call模式，支持ai agent，支持分布式多卡推理，支持多模态，支持gradio聊天界面。
Language:C++943
NetEase-Media/grps
【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架，支持dynamic batching、streaming模式，支持python/c++双语言，可限制，可拓展，高性能。帮助用户快速地将模型部署到线上，并通过http/rpc接口方式提供服务。
Language:C++16713
NVIDIA/dcgm-exporter
NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Language:Go933161
NVIDIA/nvidia-container-toolkit
Build and run containers leveraging NVIDIA GPUs
Language:Go2.5k271
qhjqhj00/MemoRAG
Empowering RAG with a memory-based data interface for all-purpose applications!
Language:Python1.3k81
deepjavalibrary/djl-serving
A universal scalable machine learning model deployment solution
Language:Java19967
ColfaxResearch/cutlass-kernels
Language:Cuda16930
BestAnHongjun/LMDeploy-Jetson
Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function independently without continuous internet access.
816
RSSNext/Follow
🧡 Follow your favorites in one inbox
Language:TypeScript18.7k766
google-deepmind/optax
Optax is a gradient processing and optimization library for JAX.
Language:Python1.7k194
google/orbax
Orbax provides common checkpointing and persistence utilities for JAX users
Language:Python30736
ratatui/ratatui
A Rust crate for cooking up terminal user interfaces (TUIs) 👨‍🍳🐀 https://ratatui.rs
Language:Rust11k338

MonadKai

MonadKai's Stars

LoongServe/LoongServe

AlibabaPAI/FLASHNN

gpu-mode/triton-index

timudk/flux_triton

microsoft/BitBLAS

October2001/Awesome-KV-Cache-Compression

open-mmlab/mmdeploy

amusi/AI-Job-Notes

sgl-project/sgl-learning-materials

zkkli/I-ViT

Aleph-Alpha/scaling

wdndev/llm_interview_note

kyutai-labs/moshi

66RING/CritiPrefill

sustcsonglin/flash-linear-attention

bklieger-groq/g1

BobMcDear/attorch

NetEase-Media/grps_vllm

NetEase-Media/grps_trtllm

NetEase-Media/grps

NVIDIA/dcgm-exporter

NVIDIA/nvidia-container-toolkit

qhjqhj00/MemoRAG

deepjavalibrary/djl-serving

ColfaxResearch/cutlass-kernels

BestAnHongjun/LMDeploy-Jetson

RSSNext/Follow

google-deepmind/optax

google/orbax

ratatui/ratatui