Pinned Repositories
caffe
Ristretto: Caffe-based approximation of convolutional neural networks.
CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully :)
cutlass
CUDA Templates for Linear Algebra Subroutines
kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
onediff
OneDiff: An out-of-the-box acceleration library for diffusion models.
oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
openpose
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
rocMLIR
keilsmart's Repositories
keilsmart/caffe
Ristretto: Caffe-based approximation of convolutional neural networks.
keilsmart/CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully :)
keilsmart/cutlass
CUDA Templates for Linear Algebra Subroutines
keilsmart/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
keilsmart/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
keilsmart/onediff
OneDiff: An out-of-the-box acceleration library for diffusion models.
keilsmart/oneflow
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
keilsmart/openpose
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
keilsmart/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
keilsmart/rocMLIR
keilsmart/stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
keilsmart/Stochastic-Quantization
Training Low-bits DNNs with Stochastic Quantization
keilsmart/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
keilsmart/web-stable-diffusion
Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.
keilsmart/YOLOv3-model-pruning
对 YOLOv3 做模型剪枝,在 oxford hand 数据集上模型的参数量减少 80% ,FLOPs 降低 70%,Infer 的速度可以达到原来的 200%,mAP 基本保持不变