MARD1NO

Paddle Sucks | Rebound

SiliconFlowNeverland

MARD1NO's Stars

excalidraw/excalidraw
Virtual whiteboard for sketching hand-drawn like diagrams
Language:TypeScript74k 381 3.2k6.7k
Delgan/loguru
Python logging made (stupidly) simple
Language:Python18.2k 138 956671
THUDM/ChatGLM3
ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
Language:Python12.4k 96 7141.4k
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++6.8k 82 1.3k705
deepseek-ai/DeepSeek-Coder
DeepSeek Coder: Let the Code Write Itself
Language:Python5.5k 62 139378
pytorch-labs/gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python5.2k 59 86464
S-LoRA/S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Language:Python1.5k 24 3777
google/maxtext
A simple, performant and scalable Jax LLM!
Language:Python1.3k 22 59227
hao-ai-lab/LookaheadDecoding
Language:Python989 9 5162
chengzeyi/stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
Language:Python976 13 10754
AILab-CVC/UniRepLKNet
[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition
Language:Python823 12 1650
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda661 13 5054
Mq-b/Loser-HomeWork
卢瑟们的作业展示，答案讲解，以及一些C++知识
Language:C++529 6 32123
triton-inference-server/tensorrtllm_backend
The Triton TensorRT-LLM Backend
Language:Python505 23 36168
bobby-he/simplified_transformers
Language:Python273 2 424
Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Language:Cuda203 3 643
Dao-AILab/causal-conv1d
Causal depthwise conv1d in CUDA, with a PyTorch interface
Language:Cuda181 3 1537
OscarXZQ/weight-selection
Language:Python156 3 210
microsoft/mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
Language:C++152 17 7925
DeepLangAI/LingoWhale-8B
LingoWhale-8B: Open Bilingual LLMs | 开源双语预训练大模型
Language:Python126 4 48
MooreThreads/MobiMaliangSDK
Language:Python111 4 78
AILab-CVC/GroupMixFormer
GroupMixAttention and GroupMixFormer
Language:Python106 9 411
Bruce-Lee-LY/flash_attention_inference
Performance of the C++ interface of flash attention, flash attention v2 and self quantized decoding attention in large language model (LLM) inference scenarios.
Language:C++8013
Dao-AILab/fast-hadamard-transform
Fast Hadamard transform in CUDA, with a PyTorch interface
Language:C54 3 32
mit-han-lab/spatten-llm
[HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning
Language:Scala48 8 13
bojone/FSQ
Keras implement of Finite Scalar Quantization
Language:Python46 2 02
reed-lau/cute-gemm
Language:C++45 2 414
gusye1234/chat-spot
A Spotlight app. You can talk and snip anything to ChatGPT at your finger-tips
Language:TypeScript27 1 16
tridao/cutlass_quant
Playing with quantization
Language:HTML25 2 02
hahnyuan/TorchQuantExtension
Pytorch extension for quantization with high-efficient CUDA kernels
Language:Cuda80

MARD1NO

MARD1NO's Stars

excalidraw/excalidraw

Delgan/loguru

THUDM/ChatGLM3

NVIDIA/TensorRT-LLM

deepseek-ai/DeepSeek-Coder

pytorch-labs/gpt-fast

S-LoRA/S-LoRA

google/maxtext

hao-ai-lab/LookaheadDecoding

chengzeyi/stable-fast

AILab-CVC/UniRepLKNet

flashinfer-ai/flashinfer

Mq-b/Loser-HomeWork

triton-inference-server/tensorrtllm_backend

bobby-he/simplified_transformers

Bruce-Lee-LY/cuda_hgemm

Dao-AILab/causal-conv1d

OscarXZQ/weight-selection

microsoft/mscclpp

DeepLangAI/LingoWhale-8B

MooreThreads/MobiMaliangSDK

AILab-CVC/GroupMixFormer

Bruce-Lee-LY/flash_attention_inference

Dao-AILab/fast-hadamard-transform

mit-han-lab/spatten-llm

bojone/FSQ

reed-lau/cute-gemm

gusye1234/chat-spot

tridao/cutlass_quant

hahnyuan/TorchQuantExtension