Pinned Repositories
AMG
Algebraic multigrid benchmark
Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
awesome-model-quantization
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
Batched-SpMM
New batched algorithm for sparse matrix-matrix multiplication (SpMM)
BLASTed
Fine-grain parallel iterative methods
cfs-spmv
Conflict-free symmetric SpMV library
CPP
Lecture notes, projects and other materials for Course 'CS205 C/C++ Program Design' at Southern University of Science and Technology.
cuFoam
cuFoam is a cuda based linear equations solver for OpenFoam.
HPC-Lab-Docs
Documentation for HPC course
professional-cuda-c-programming
MicroZHY's Repositories
MicroZHY/tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
MicroZHY/wmma_extension
An extension library of WMMA API (Tensor Core API)
MicroZHY/ted-join-hipc22
MicroZHY/mixed-precision-ir
Mixed Precision Iterative Refinement
MicroZHY/CPP
Lecture notes, projects and other materials for Course 'CS205 C/C++ Program Design' at Southern University of Science and Technology.
MicroZHY/DissectingTensorCores
MicroZHY/Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
MicroZHY/ShflBW_Sparse_NN
MicroZHY/TileSpGEMM
Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Yuyao Niu, Zhengyang Lu, Haonan Ji, Shuhui Song, Zhou Jin, and Weifeng Liu.
MicroZHY/CUDA-Optimization-Guide
Xiao's CUDA Optimization Guide [Active Adding New Contents]
MicroZHY/interview
📚 C/C++面试知识总结
MicroZHY/cuda-tensorcores-register-mapping
MicroZHY/tsqr-tc
TSQR on TensorCores
MicroZHY/cuda-tensorcore-hgemm
MicroZHY/vectorSparse
MicroZHY/TileSpMV
Source code of the IPDPS '21 paper: "TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs" by Yuyao Niu, Zhengyang Lu, Meichen Dong, Zhou Jin, Weifeng Liu, and Guangming Tan.
MicroZHY/hylo
MicroZHY/HPC-Notes
Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]
MicroZHY/WCycleSVD
MicroZHY/moderngpu
Patterns and behaviors for GPU computing
MicroZHY/python_interview_question
关于python的面试题
MicroZHY/Tensor-FFT
A implementation of an FFT algorithm targeting fp16 data to accelerate its processing by utilizing tensor cores
MicroZHY/TCStencil
MicroZHY/TC-enhanced_Cross-correlation_Function
Calculation of Cross-correlation Function Accelerated by Tensor Cores with TensorFloat-32 precision on Ampere GPU
MicroZHY/LATER
Linear Algebra on TEnsoRcore
MicroZHY/recblock-sptrsv
Source code of the ICPP '20 paper: "Efficient Block Algorithms for Parallel Sparse Triangular Solve" by Zhengyang Lu, Yuyao Niu, and Weifeng Liu.
MicroZHY/stencil_GPU
Stencil computation on NVidia GPU (Tesla V100)
MicroZHY/stencil_CPU
Stencil computation on Arm architecture (Kunpeng 920)
MicroZHY/BLASTed
Fine-grain parallel iterative methods
MicroZHY/m-thesis-thomas
Iterative Refinement with Hierarchical Low-Rank Preconditioners using Mixed Precision