Pinned Repositories
BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
ByteDance-BuyTicket
字节跳动-字学镜像计划-【后端】如果有一千万个人抢票怎么办?
course
高性能并行编程与优化 - 课件
cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
deeplabv3plus-keras
deeplabv3plus (Google's new algorithm for semantic segmentation) in keras:Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
ptx
sql_node
sql学习笔记。
TensorRT_Tutorial
tvm_mlir_learn
compiler learning resources collect.
903664689's Repositories
903664689/TensorRT_Tutorial
903664689/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
903664689/BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
903664689/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
903664689/course
高性能并行编程与优化 - 课件
903664689/tvm_mlir_learn
compiler learning resources collect.
903664689/ByteDance-BuyTicket
字节跳动-字学镜像计划-【后端】如果有一千万个人抢票怎么办?
903664689/sql_node
sql学习笔记。
903664689/ptx
903664689/deeplabv3plus-keras
deeplabv3plus (Google's new algorithm for semantic segmentation) in keras:Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation