MegaStone's Stars
Anduin2017/HowToCook
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
openai/whisper
Robust Speech Recognition via Large-Scale Weak Supervision
dair-ai/Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
nothings/stb
stb single-file public domain libraries for C/C++
facebook/zstd
Zstandard - Fast real-time compression algorithm
mlc-ai/mlc-llm
Universal LLM Deployment Engine with ML Compilation
ImageMagick/ImageMagick
🧙♂️ ImageMagick 7
wuye9036/CppTemplateTutorial
中文的C++ Template的教学指南。与知名书籍C++ Templates不同,该系列教程将C++ Templates作为一门图灵完备的语言来讲授,以求帮助读者对Meta-Programming融会贯通。(正在施工中)
FMInference/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
KhronosGroup/Vulkan-Samples
One stop solution for all Vulkan samples
andikleen/pmu-tools
Intel PMU profiling tools
KomputeProject/kompute
General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). Blazing fast, mobile-enabled, asynchronous and optimized for advanced GPU data processing usecases. Backed by the Linux Foundation.
Themaister/Granite
My personal Vulkan renderer
ELS-RD/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
mit-han-lab/smoothquant
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
NVIDIA/nvbench
CUDA Kernel Benchmarking Library
Jokeren/Awesome-GPU
Awesome resources for GPUs
cloudcores/CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully :)
XiaoSong9905/CUDA-Optimization-Guide
Xiao's CUDA Optimization Guide [Active Adding New Contents]
google/uVkCompute
A micro Vulkan compute pipeline and a collection of benchmarking compute shaders
te42kyfo/gpu-benches
collection of benchmarks to measure basic GPU capabilities
NVIDIA/nsight-training
Training material for Nsight developer tools
AyakaGEMM/Hands-on-GEMM
Jokeren/GPA
GPU Performance Advisor
ubc-aamodt-group/vulkan-sim
Vulkan-Sim is a GPU architecture simulator for Vulkan ray tracing based on GPGPU-Sim and Mesa.
utcs-scea/altis
A benchmarking suite for heterogeneous systems. The primary goal of this project is to improve and update aspects of existing benchmarking suites which are either insufficient or outdated.