Miroier's Stars
wting/autojump
A cd command that learns - easily navigate directories from the command line
pengsida/learning_research
本人的科研经验
FoundationVision/VAR
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
CaiJimmy/hugo-theme-stack
Card-style Hugo theme designed for bloggers
google/bloaty
Bloaty: a size profiler for binaries
AnswerDotAI/gpu.cpp
A lightweight library for portable low-level GPU computation using WebGPU.
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
ELS-RD/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
srush/Triton-Puzzles
Puzzles for learning Triton
xdit-project/xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
kendryte/nncase
Open deep learning compiler stack for Kendryte AI accelerators ✨
feifeibear/LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
Jack47/hack-SysML
The road to hack SysML and become an system expert
KnowingNothing/compiler-and-arch
A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture
harleyszhang/dl_note
深度学习系统笔记,包含深度学习数学基础知识、神经网络基础部件详解、深度学习炼丹策略、模型压缩算法详解。
FlagOpen/FlagGems
FlagGems is an operator library for large language models implemented in Triton Language.
alangrainger/share-note
Instantly share an Obsidian note with the full theme exactly like you see in your vault. Data is shared encrypted by default, and only you and the person you send it to have the key.
Sergei-Korneev/obsidian-local-images-plus
This repo is a reincarnation of obsidian-local-images plugin which main aim was downloading images in md notes to local storage.
te42kyfo/gpu-benches
collection of benchmarks to measure basic GPU capabilities
bytedance/flux
A fast communication-overlapping library for tensor parallelism on GPUs.
online-judge-tools/verification-helper
a testing framework for snippet libraries used in competitive programming
bytedance/ABQ-LLM
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
FlagOpen/FlagScale
FlagScale is a large model toolkit based on open-sourced projects.
TiledTensor/TiledCUDA
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
ROCm/rocWMMA
rocWMMA
sjfeng1999/gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
fanshiqing/grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
tgale96/grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
weishengying/cutlass_flash_atten_fp8
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
ZonePG/cs-notes
my cs notes