qelk123

XJTUXi'an

qelk123's Stars

microsoft/triton-shared
Shared Middle-Layer for Triton Compilation
Language:MLIR17837
TiledTensor/TiledCUDA
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
Language:C++1439
krahets/hello-algo
《Hello 算法》：动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新，English version ongoing
Language:Java97.8k12.4k
eillsu/iTerm2-Chinese-Tutorial
iTerm2 中文教程
10019
KnowingNothing/MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
Language:C++28330
epfml/dynamic-sparse-flash-attention
Language:Jupyter Notebook1316
Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
Language:Cuda819130
karpathy/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda24.2k2.7k
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.4k123
sjfeng1999/gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
Language:Cuda8124
j2kun/mlir-tutorial
MLIR For Beginners tutorial
Language:C++79964
llvm/torch-mlir
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
Language:C++1.3k502
bytedance/byteir
A model compilation solution for various hardware
Language:MLIR37341
meta-llama/llama
Inference code for Llama models
Language:Python56.2k9.6k
NVlabs/NVBit
22221
zwang4/awesome-machine-learning-in-compilers
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
1.4k163
UniHD-CEG/cuda-flux
CUDA Flux is a profiler for GPU applications which reports the basic block executions frequencies of compute kernels
Language:C++316
hpc-ulisboa/gpuPTXModel
GPU Static Modeling using PTX and Deep Structured Learning
Language:Cuda172
lanl/PPT
Performance Prediction Toolkit
Language:Python5112
UniHD-CEG/gpu-mangrove
machine learning model for execution time and power prediction of CUDA kernels
44
sderek/CUDAAdvisor
CUDAAdvisor: a GPU profiling tool
Language:Cuda4814
gpgpu-sim/gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
Language:C++1.1k506
NVIDIA/cccl
CUDA Core Compute Libraries
Language:C++1.2k156
gem5/gem5
The official repository for the gem5 computer-system architecture simulator.
Language:C++1.7k1.2k
arrayfire/arrayfire
ArrayFire: a general purpose GPU library.
Language:C++4.6k534
flexflow/FlexFlow
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
Language:C++1.7k224
NervanaSystems/maxas
Assembler for NVIDIA Maxwell architecture
Language:Sass947162
SuperScientificSoftwareLaboratory/DASP
Source code of the SC '23 paper: "DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multiplication" by Yuechen Lu and Weifeng Liu.
Language:C++173
kokkos/kokkos
Kokkos C++ Performance Portability Programming Ecosystem: The Programming Model - Parallel Execution and Memory Abstraction
Language:C++2k436
LC044/WeChatMsg
提取微信聊天记录，将其导出成HTML、Word、Excel文档永久保存，对聊天记录进行分析生成年度聊天报告，用聊天数据训练专属于个人的AI聊天助手
Language:Python34.1k3.6k

qelk123

qelk123's Stars

microsoft/triton-shared

TiledTensor/TiledCUDA

krahets/hello-algo

eillsu/iTerm2-Chinese-Tutorial

KnowingNothing/MatmulTutorial

epfml/dynamic-sparse-flash-attention

Liu-xiandong/How_to_optimize_in_GPU

karpathy/llm.c

flashinfer-ai/flashinfer

sjfeng1999/gpu-arch-microbenchmark

j2kun/mlir-tutorial

llvm/torch-mlir

bytedance/byteir

meta-llama/llama

NVlabs/NVBit

zwang4/awesome-machine-learning-in-compilers

UniHD-CEG/cuda-flux

hpc-ulisboa/gpuPTXModel

lanl/PPT

UniHD-CEG/gpu-mangrove

sderek/CUDAAdvisor

gpgpu-sim/gpgpu-sim_distribution

NVIDIA/cccl

gem5/gem5

arrayfire/arrayfire

flexflow/FlexFlow

NervanaSystems/maxas

SuperScientificSoftwareLaboratory/DASP

kokkos/kokkos

LC044/WeChatMsg