Pinned Repositories
googletest
GoogleTest - Google Testing and Mocking Framework
how-to-optimize-gemm
Quantum
Microsoft Quantum Development Kit Samples
TorchBench
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
MIOpen
AMD's Machine Intelligence Library
MITuna
rocComposer
AMD composer for High Performance Deep Learning Kernels and Libraries
junliume's Repositories
junliume/googletest
GoogleTest - Google Testing and Mocking Framework
junliume/how-to-optimize-gemm
junliume/Quantum
Microsoft Quantum Development Kit Samples
junliume/TorchBench
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.
junliume/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
junliume/ZLUDA
CUDA on AMD GPUs