carlushuang

AMDshanghai

Pinned Repositories

avx_flops
Benchmark cpu flops using avx instructions
Language:C5 4 00
cpu_gemm_opt
how to design cpu gemm on x86 with avx256, that can beat openblas.
Language:C++60 7 115
deepcore_source_code
Subpart source code of of deepcore v0.7
Language:C1 2 00
FFT_implement
fft/ifft, r2c/c2r, 2d_r2c/2d_c2r, convolve, correlation, tiling fft, srfft, pfa, radix-2/3/5
Language:C++3 2 03
gcnasm
amdgpu example code in hip/asm
Language:C++10 3 011
gemm_implementations
Language:C11
miopen_cudnn_ops
Language:C++5 5 15
ogl_cube
observe a cube with basic arcball camera in c++
Language:C10
composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
Language:C++258 25 201103
MISA
Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)
Language:Python33 25 1513

carlushuang's Repositories

carlushuang/cpu_gemm_opt
how to design cpu gemm on x86 with avx256, that can beat openblas.
Language:C++60 7 115
carlushuang/gcnasm
amdgpu example code in hip/asm
Language:C++10 3 011
carlushuang/avx_flops
Benchmark cpu flops using avx instructions
Language:C5 4 00
carlushuang/miopen_cudnn_ops
Language:C++5 5 15
carlushuang/FFT_implement
fft/ifft, r2c/c2r, 2d_r2c/2d_c2r, convolve, correlation, tiling fft, srfft, pfa, radix-2/3/5
Language:C++3 2 03
carlushuang/deepcore_source_code
Subpart source code of of deepcore v0.7
Language:C1 2 00
carlushuang/gemm_implementations
Language:C11
carlushuang/mkldnn_test
Language:C++0 3 01
carlushuang/amdgpu-jit
test project for amdgpu codegen
2 0
carlushuang/attn_bench
Language:Python
carlushuang/auto_gen
auto gen
Language:C++2 0
carlushuang/binutils-gdb
Unofficial mirror of sourceware binutils-gdb repository. Updated daily.
Language:C
carlushuang/CWBVH
An implementation of NVIDIA's paper "Efficient Incoherent Ray Traversal on GPUs Through Compressed Wide BVHs"
carlushuang/D3D12nBodyGravity_clang
D3D12nBodyGravity example with clang build
Language:C
carlushuang/HIP
HIP : Convert CUDA to Portable C++ Code
Language:C++2 0
carlushuang/HIP-Examples
Examples for HIP
carlushuang/hipBLAS
ROCm BLAS marshalling library
Language:C++3 0
carlushuang/hsaco-jit
Language:C++
carlushuang/kernel-launcher-amdgpu
Language:C++
carlushuang/LLVM_Note
Language:C++2 0
carlushuang/Mandelbrot-Set
mandelbrot set
Language:Python2 0
carlushuang/miopen-benchmark
benchmarking miopen
Language:C++3 0
carlushuang/mlir
"Multi-Level Intermediate Representation" Compiler Infrastructure
carlushuang/Paddle
PArallel Distributed Deep LEarning
Language:C++2 0
carlushuang/rocBLAS
Next generation BLAS implementation for ROCm platform
Language:C++2 0
carlushuang/rocm-recipes
Recipes for rocm
Language:CMake2 0
carlushuang/Tensile
Stretching GPU performance for GEMMs and tensor contractions.
Language:Python3 0
carlushuang/tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
Language:Cuda2 0
carlushuang/tvm_playground
Language:Python2 0
carlushuang/xbyak
a JIT assembler for x86(IA-32)/x64(AMD64, x86-64) MMX/SSE/SSE2/SSE3/SSSE3/SSE4/FPU/AVX/AVX2/AVX-512 by C++ header
Language:C++