Pinned Repositories
cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
caffe
Caffe: a fast open framework for deep learning.
MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
feiyuvl's Repositories
feiyuvl/caffe
Caffe: a fast open framework for deep learning.