MARD1NO

Paddle very good | I still feel you here

SiliconFlowNeverland

MARD1NO's Stars

radarFudan/Awesome-state-space-models
Collection of papers on state-space models
46014
hahnyuan/ASVD4LLM
Activation-aware Singular Value Decomposition for Compressing Large Language Models
Language:Python284
sneaxiy/AAdiffTools
Language:Cuda1
microsoft/superbenchmark
A validation and profiling tool for AI infrastructure
Language:Python20448
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python5.3k476
Dao-AILab/causal-conv1d
Causal depthwise conv1d in CUDA, with a PyTorch interface
Language:Cuda19939
bobby-he/simplified_transformers
Language:Python27424
MooreThreads/MobiMaliangSDK
Language:Python1158
microsoft/mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
Language:C++16327
Dao-AILab/fast-hadamard-transform
Fast Hadamard transform in CUDA, with a PyTorch interface
Language:C575
OscarXZQ/weight-selection
Language:Python15910
gusye1234/chat-spot
A Spotlight app. You can talk and snip anything to ChatGPT at your finger-tips
Language:TypeScript276
AILab-CVC/UniRepLKNet
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Language:Python84652
hao-ai-lab/LookaheadDecoding
Language:Python1k62
reed-lau/cute-gemm
Language:C++5116
AILab-CVC/GroupMixFormer
GroupMixAttention and GroupMixFormer
Language:Python10711
excalidraw/excalidraw
Virtual whiteboard for sketching hand-drawn like diagrams
Language:TypeScript75.5k6.8k
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda72158
S-LoRA/S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Language:Python1.6k76
DeepLangAI/LingoWhale-8B
LingoWhale-8B: Open Bilingual LLMs | 开源双语预训练大模型
Language:Python1299
deepseek-ai/DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
Language:Python5.7k402
mit-han-lab/spatten-llm
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Language:Scala523
bojone/FSQ
Keras implement of Finite Scalar Quantization
Language:Python482
Mq-b/Loser-HomeWork
卢瑟们的作业展示，答案讲解，以及一些C++知识
Language:C++544125
hahnyuan/TorchQuantExtension
Pytorch extension for quantization with high-efficient CUDA kernels
Language:Cuda8
THUDM/ChatGLM3
ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
Language:Python12.8k1.5k
google/maxtext
A simple, performant and scalable Jax LLM!
Language:Python1.3k229
Delgan/loguru
Python logging made (stupidly) simple
Language:Python18.5k677
triton-inference-server/tensorrtllm_backend
The Triton TensorRT-LLM Backend
Language:Python55377
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++7.1k760

MARD1NO

MARD1NO's Stars

radarFudan/Awesome-state-space-models

hahnyuan/ASVD4LLM

sneaxiy/AAdiffTools

microsoft/superbenchmark

pytorch-labs/gpt-fast

Dao-AILab/causal-conv1d

bobby-he/simplified_transformers

MooreThreads/MobiMaliangSDK

microsoft/mscclpp

Dao-AILab/fast-hadamard-transform

OscarXZQ/weight-selection

gusye1234/chat-spot

AILab-CVC/UniRepLKNet

hao-ai-lab/LookaheadDecoding

reed-lau/cute-gemm

AILab-CVC/GroupMixFormer

excalidraw/excalidraw

flashinfer-ai/flashinfer

S-LoRA/S-LoRA

DeepLangAI/LingoWhale-8B

deepseek-ai/DeepSeek-Coder

mit-han-lab/spatten-llm

bojone/FSQ

Mq-b/Loser-HomeWork

hahnyuan/TorchQuantExtension

THUDM/ChatGLM3

google/maxtext

Delgan/loguru

triton-inference-server/tensorrtllm_backend

NVIDIA/TensorRT-LLM