Pinned Repositories
2020-Huawei-Code-Craft
My thoughts of 2020 Huawei Code Craft.
c6678code
tms320c6678 test code
ebook-1
A collection of classic computer science books from Internet
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
keystone_tms320c6678l
OpenBLAS
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
pprof
pprof is a tool for visualization and analysis of profiling data
qnnpack
Explained QNNPACK Implementation
xiangchunyang's Repositories
xiangchunyang/MNN
MNN is a lightweight deep neural network inference engine.
xiangchunyang/A-Not-So-Short-Guide-for-LinJun-Group
浙江大学王林军课题组入门指南
xiangchunyang/batched_gemm
xiangchunyang/csky_crosscompile_x86_64
xiangchunyang/block_matrix_format_performance
xiangchunyang/SpMP
sparse matrix pre-processing library
xiangchunyang/spring2019
CME213 Course Website, Spring 2019
xiangchunyang/qnnpack
Explained QNNPACK Implementation
xiangchunyang/chgemm
symmetric int8 gemm
xiangchunyang/matrix_format_performance
xiangchunyang/NVIDIA-tensor-core-examples
xiangchunyang/multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
xiangchunyang/small_dgemms
Mini-app investigating performance of many small dgemm operations.
xiangchunyang/blislab
BLISlab: A Sandbox for Optimizing GEMM
xiangchunyang/nnom
A higher-level Neural Network library for microcontrollers.
xiangchunyang/DMRG-openMP
Apply HTarget kernel of DMRG++ using openMP / Kokkos / HabaneroC++ and sequential code
xiangchunyang/cutlass
CUDA Templates for Linear Algebra Subroutines
xiangchunyang/LowSynchGMRESAlgorithms
A collection of matlab codes for Low Synch GMRES algorithms, as presented in the manuscipt [LINK]
xiangchunyang/Batched-SpMM
New batched algorithm for sparse matrix-matrix multiplication (SpMM)
xiangchunyang/Daxpy-CUDA
Example of the Daxpy algorithm implemented using CUDA for execution on Nvidia GPUs. Example provided from Multi-core & GPU Programming class.
xiangchunyang/msu.supercomputers.course
Homeworks for "High perfomance computing" course in CMC MSU
xiangchunyang/benchmark
A microbenchmark support library
xiangchunyang/pprof
pprof is a tool for visualization and analysis of profiling data
xiangchunyang/Tengine
Tengine is a lite, high performance, modular inference engine for embedded device
xiangchunyang/FeatherCNN
FeatherCNN is a high performance inference engine for convolutional neural networks.
xiangchunyang/batch-matmul-cuda
A simple and understandable CUDA kernel for batch-matmul operation
xiangchunyang/English-level-up-tips-for-Chinese
可能是让你受益匪浅的英语进阶指南
xiangchunyang/zhou_6678
xiangchunyang/nccl-examples
NCCL Examples from Official NVIDIA NCCL Developer Guide.
xiangchunyang/c6678code
tms320c6678 test code