zwshan's Stars
NVIDIA/cudnn-frontend
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
Hardware-Alchemy/cuDNN-sample
cuDNN sample codes provided by Nvidia
OpenPPL/ppq
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
Pinging-ZJU/Pytorch-Memory-Utils
pytorch memory track code
neural-boost/neural-boost
Neural Boost targeting to boost inference performance.
Tiiiger/QPyTorch
Low Precision Arithmetic Simulation in PyTorch
godweiyang/NN-CUDA-Example
Several simple examples for popular neural network toolkits calling custom CUDA operators.
stefbraun/rnn_benchmarks
RNN benchmarks of pytorch, tensorflow and theano
guanh01/CS692-mlsys
This is the (evolving) reading list for the seminar.
zwshan/grnn
howardlau1999/sysu-thesis-typst
中山大学学位论文 Typst 模板
exaloop/codon
A high-performance, zero-overhead, extensible Python compiler using LLVM
kaixindelele/ChatPaper
Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复
marcpaga/nanopore_benchmark
ziishaned/learn-regex
Learn regex the easy way
nanoporetech/bonito
A PyTorch Basecaller for Oxford Nanopore Reads
fmfi-compbio/deepnano-blitz
Very fast ONT basecaller
arcsysu/SYsU-lang
A mini, simple and modular compiler lab for SYsU/SysY(tiny C). Based on Clang/LLVM/ANTLR4/Bison/Flex.
hbrunie/PyFloT
Mixed precision tuning tool
ElegantLaTeX/ElegantPaper
Elegant LaTeX Template for Working Papers
MoZeWei/moTuner
SciCompKL/CoDiPack
Fast gradient evaluation in C++ based on Expression Templates.
arcsysu/Weekly-Paper-Sharing-OSM
组会论文分享“OSM: Off-Chip Shared Memory for GPUs”的 $\LaTeX$ 展示源码
minhhn2910/CUDA-mixed-precision
Mixed precision between FP32 and FP16x2 in CUDA programs
raydongpub/GPU-FPtuner
LLNL/adapt-fp
chwan1016/awesome-gnn-systems
A list of awesome GNN systems.
ccfddl/ccf-deadlines
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~