MegaStone

IME, Tsinghua University

MegaStone's Stars

Anduin2017/HowToCook
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
Language:Dockerfile65.4k 396 6578.6k
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
Language:Python65.2k 545 07.6k
dair-ai/Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
Language:MDX46.6k 519 1574.5k
nothings/stb
stb single-file public domain libraries for C/C++
Language:C25.9k 623 8147.7k
facebook/zstd
Zstandard - Fast real-time compression algorithm
Language:C22.9k 412 1.4k2k
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
Language:Python17.9k 168 1.2k1.4k
ImageMagick/ImageMagick
🧙‍♂️ ImageMagick 7
Language:C11.7k 206 3k1.3k
wuye9036/CppTemplateTutorial
中文的C++ Template的教学指南。与知名书籍C++ Templates不同，该系列教程将C++ Templates作为一门图灵完备的语言来讲授，以求帮助读者对Meta-Programming融会贯通。(正在施工中)
Language:C++9.4k 522 441.5k
FMInference/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
Language:Python9.1k 109 81531
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
Language:Python8.1k 80 502574
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++7.7k 88 1.7k840
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++5k 108 951854
KhronosGroup/Vulkan-Samples
One stop solution for all Vulkan samples
Language:C++4.1k 91 419615
andikleen/pmu-tools
Intel PMU profiling tools
Language:Python2k 87 436330
KomputeProject/kompute
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
Language:C++1.9k 34 215145
Themaister/Granite
My personal Vulkan renderer
Language:C++1.5k 59 22134
ELS-RD/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Language:Jupyter Notebook1.5k 27 17493
mit-han-lab/smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Language:Python1.1k 21 84129
Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
Language:Cuda771 13 15121
NVIDIA/nvbench
CUDA Kernel Benchmarking Library
Language:Cuda452 18 9061
Jokeren/Awesome-GPU
Awesome resources for GPUs
443 23 046
cloudcores/CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
Language:Python375 10 2067
XiaoSong9905/CUDA-Optimization-Guide
Xiao's CUDA Optimization Guide [Active Adding New Contents]
217 1 016
google/uVkCompute
A micro Vulkan compute pipeline and a collection of benchmarking compute shaders
Language:C++210 14 1034
te42kyfo/gpu-benches
collection of benchmarks to measure basic GPU capabilities
Language:Jupyter Notebook181 6 929
NVIDIA/nsight-training
Training material for Nsight developer tools
Language:C117 6 230
AyakaGEMM/Hands-on-GEMM
Language:Cuda83 2 312
Jokeren/GPA
GPU Performance Advisor
Language:Python57 5 48
ubc-aamodt-group/vulkan-sim
Vulkan-Sim is a GPU architecture simulator for Vulkan ray tracing based on GPGPU-Sim and Mesa.
Language:C++43 7 2910
utcs-scea/altis
A benchmarking suite for heterogeneous systems. The primary goal of this project is to improve and update aspects of existing benchmarking suites which are either insufficient or outdated.
Language:C36 5 814

MegaStone

MegaStone's Stars

Anduin2017/HowToCook

openai/whisper

dair-ai/Prompt-Engineering-Guide

nothings/stb

facebook/zstd

mlc-ai/mlc-llm

ImageMagick/ImageMagick

wuye9036/CppTemplateTutorial

FMInference/FlexGen

facebookresearch/xformers

NVIDIA/TensorRT-LLM

NVIDIA/cutlass

KhronosGroup/Vulkan-Samples

andikleen/pmu-tools

KomputeProject/kompute

Themaister/Granite

ELS-RD/kernl

mit-han-lab/smoothquant

Liu-xiandong/How_to_optimize_in_GPU

NVIDIA/nvbench

Jokeren/Awesome-GPU

cloudcores/CuAssembler

XiaoSong9905/CUDA-Optimization-Guide

google/uVkCompute

te42kyfo/gpu-benches

NVIDIA/nsight-training

AyakaGEMM/Hands-on-GEMM

Jokeren/GPA

ubc-aamodt-group/vulkan-sim

utcs-scea/altis