feiyuvl

Pinned Repositories

cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Language:Cuda304 4 1267
caffe
Caffe: a fast open framework for deep learning.
Language:C++0 1 00
MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
Language:C++290 8 1131
gpt-fast
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Language:Python5.7k 60 104514

feiyuvl/caffe
Caffe: a fast open framework for deep learning.
Language:C++0 1 00