Pinned Repositories
2020-Huawei-Code-Craft
My thoughts of 2020 Huawei Code Craft.
c6678code
tms320c6678 test code
ebook-1
A collection of classic computer science books from Internet
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
keystone_tms320c6678l
OpenBLAS
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
pprof
pprof is a tool for visualization and analysis of profiling data
qnnpack
Explained QNNPACK Implementation
xiangchunyang's Repositories
xiangchunyang/All-About-TeX
对于LaTeX新手,你能想到的问题,几乎都能在我这个仓库找到
xiangchunyang/ANT-MOC
xiangchunyang/Book1_Python-For-Beginners
Book_1_《编程不难》 | 鸢尾花书:从加减乘除到机器学习;请多多批评指正!
xiangchunyang/Book2_Beauty-of-Data-Visualization
Book_2_《可视之美》 | 鸢尾花书:从加减乘除到机器学习,欢迎批评指正
xiangchunyang/Book3_Elements-of-Mathematics
Book_3_《数学要素》 | 鸢尾花书:从加减乘除到机器学习;上架;欢迎继续纠错,纠错多的同学还会有赠书!
xiangchunyang/Book4_Power-of-Matrix
Book_4_《矩阵力量》 | 鸢尾花书:从加减乘除到机器学习;上架!
xiangchunyang/CSCore
介绍计算机系统基础知识,深入剖析各种容器和算法原理,真正透彻理解各种基础技术,夯实计算机基础!!!
xiangchunyang/CUDA-Programming
Sample codes for my CUDA programming book
xiangchunyang/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
xiangchunyang/cudahandbook
Source code that accompanies The CUDA Handbook.
xiangchunyang/cuSZ
A GPU accelerated error-bounded lossy compression for scientific data.
xiangchunyang/cute-gemm
xiangchunyang/cutlass-cute-sample
xiangchunyang/cutlass_cute_experiments
xiangchunyang/DeepLearningSystem
Deep Learning System core principles introduction.
xiangchunyang/face2cuda
xiangchunyang/fasten
xiangchunyang/GSTuner
xiangchunyang/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
xiangchunyang/HPC-Learning-Notes
高性能计算相关知识学习笔记,包含学习笔记和相关知识的代码demo,在持续完善中。 如果有帮助的话请Star一下,对作者帮助很大,谢谢!
xiangchunyang/linux-perf-examples
极客时间《Linux 性能优化实战》案例
xiangchunyang/MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
xiangchunyang/nccl-tests
NCCL Tests
xiangchunyang/numactl
NUMA support for Linux
xiangchunyang/openmp-tutorial-1
openmp tutorial series
xiangchunyang/playground-gay4l6to
Tech.io playground
xiangchunyang/Python-for-Tensor-Network-Tutorial
Python for Tensor Network: Tutorial. The lecturing vedios (in Chinese) can be found at https://space.bilibili.com/401005433
xiangchunyang/SZ3
xiangchunyang/ucasthesis
LaTeX Thesis Template for the University of Chinese Academy of Sciences
xiangchunyang/wmma_extension
An extension library of WMMA API (Tensor Core API)