sjfeng1999

Pinned Repositories

turingas
Assembler for NVIDIA Volta and Turing GPUs
Language:Python189 11 1040
cub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
Language:Cuda1.7k 89 281445
cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++5k 107 942849
cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++0 0 00
gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
Language:Cuda67 2 221
mlx
MLX: An array framework for Apple silicon
Language:C++00
snippets
Language:C++1 1 00
TNN
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts. TNN has been deployed in multiple Apps from Tencent, such as Mobile QQ, Weishi, Pitu, etc. Contributions are welcome to work in collaborative with us and make TNN a better framework.
Language:C++4.3k 92 950762

sjfeng1999's Repositories

sjfeng1999/gpu-arch-microbenchmark
Dissecting NVIDIA GPU Architecture
Language:Cuda67 2 221
sjfeng1999/snippets
Language:C++1 1 00
sjfeng1999/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++0 0 00
sjfeng1999/mlx
MLX: An array framework for Apple silicon
Language:C++00