Kipsora

CS PhD Student @ University of Toronto 終わらない歌を歌おう！！！

University of TorontoToronto, Canada

Kipsora's Stars

ocornut/imgui
Dear ImGui: Bloat-free Graphical User interface for C++ with minimal dependencies
Language:C++64.1k 1k 6.3k10.7k
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
Language:Python25.9k 203 5732.5k
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python12.6k 91 1.6k1.4k
adam-maj/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
Language:SystemVerilog8k 73 25614
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++7.2k 113 1.3k1.2k
hojonathanho/diffusion
Denoising Diffusion Probabilistic Models
Language:Python4.3k 24 22406
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
Language:Cuda2.2k 38 42130
gpu-mode/resource-stream
GPU programming related news and material links
1.4k 45 284
NVIDIA/gdrcopy
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
Language:C++1k 55 196155
bytedance/flux
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Language:C++810 13 4550
NVIDIA/multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Language:Cuda667 31 14119
tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
Language:C++613 15 1386
feifeibear/long-context-attention
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
Language:Python459 5 2333
accel-sim/accel-sim-framework
This is the top-level repository for the Accel-Sim framework.
Language:Python377 11 218135
KnowingNothing/MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
Language:C++331 9 1238
microsoft/mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
Language:C++321 18 11345
HazyResearch/flash-fft-conv
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Language:C++308 16 2628
66RING/tiny-flash-attention
flash attention tutorial written in python, triton, cuda, cutlass
Language:Cuda306 3 1232
appl-team/appl
🍎APPL: A Prompt Programming Language. Seamlessly integrate LLMs with programs.
Language:Python242 7 47
shawntan/scattermoe
Triton-based implementation of Sparse Mixture of Experts.
Language:Python209 4 1617
DefTruth/Awesome-Diffusion-Inference
📖A curated list of Awesome Diffusion Inference Papers with codes: Sampling, Caching, Multi-GPUs, etc. 🎉🎉
198 8 013
ColfaxResearch/cutlass-kernels
Language:Cuda193 11 531
TiledTensor/TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
Language:C++179 3 6411
njuhope/cuda_sgemm
Language:Cuda109 1 330
ColfaxResearch/cfx-article-src
Language:C++87 5 322
tgale96/grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
Language:Cuda73 3 1750
andylolu2/simpleGEMM
The simplest but fast implementation of matrix multiplication in CUDA.
Language:Cuda34 2 04
mcrl/tccl
Thunder Research Group's Collective Communication Library
Language:C++34 3 33
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉
3 0 0260
DefTruth/CUDA-Learn-Notes
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Language:Cuda3 0 01

Kipsora

Kipsora's Stars

ocornut/imgui

hpcaitech/Open-Sora

sgl-project/sglang

adam-maj/tiny-gpu

NVIDIA/cutlass

hojonathanho/diffusion

HazyResearch/ThunderKittens

gpu-mode/resource-stream

NVIDIA/gdrcopy

bytedance/flux

NVIDIA/multi-gpu-programming-models

tpoisonooo/how-to-optimize-gemm

feifeibear/long-context-attention

accel-sim/accel-sim-framework

KnowingNothing/MatmulTutorial

microsoft/mscclpp

HazyResearch/flash-fft-conv

66RING/tiny-flash-attention

appl-team/appl

shawntan/scattermoe

DefTruth/Awesome-Diffusion-Inference

ColfaxResearch/cutlass-kernels

TiledTensor/TiledCUDA

njuhope/cuda_sgemm

ColfaxResearch/cfx-article-src

tgale96/grouped_gemm

andylolu2/simpleGEMM

mcrl/tccl

DefTruth/Awesome-LLM-Inference

DefTruth/CUDA-Learn-Notes