FuncJ

Board Man get paid!

FuncJ's Stars

pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python83.1k 1.7k 45.7k22.4k
labmlai/annotated_deep_learning_paper_implementations
🧑‍🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Language:Python55.1k 452 1325.7k
xx025/carrot
Free ChatGPT Site List 这儿为你准备了众多免费好用的ChatGPT镜像站点
17k 147 7821.4k
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Language:Python13.8k 115 1.1k1.3k
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
Language:Python8.5k 75 539607
NVIDIA/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Language:Python8.4k 100 1.2k1.4k
aladdinpersson/Machine-Learning-Collection
A resource for learning about Machine learning & Deep Learning
Language:Python7.6k 115 1262.7k
harvardnlp/annotated-transformer
An annotated implementation of the Transformer paper.
Language:Jupyter Notebook5.7k 65 901.2k
oneapi-src/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
Language:C++3.6k 182 1.3k992
ARM-software/ComputeLibrary
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
Language:C++2.8k 231 1.1k774
mpitutorial/mpitutorial
MPI programming lessons in C and executable code examples
Language:C2.2k 81 23751
Tony-Tan/CUDA_Freshman
Language:Cuda2.2k 10 14433
google/XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Language:C1.8k 54 223356
Tencent/TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
Language:C++1.5k 41 119197
BrightXiaoHan/CMakeTutorial
CMake中文实战教程
Language:C++1.4k 23 5279
godweiyang/NN-CUDA-Example
Several simple examples for popular neural network toolkits calling custom CUDA operators.
Language:Python1.3k 8 14186
HuangOwen/Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
1.1k 43 368
libxsmm/libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
Language:C846 50 340182
tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
Language:C++589 16 1379
bytedance/ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
Language:C++455 10 1036
KnowingNothing/MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
Language:C++281 8 1130
galeselee/Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!
157 4 06
yzhaiustc/Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
Language:C104 4 120
zhirui-gao/Deep-Template-Matching
[CVMJ2024] Learning Accurate Template Matching with Differentiable Coarse-to-fine Correspondence Refinement
Language:Python87 3 1413
SJTU-ReArch-Group/Paper-Reading-List
82 6 07
AnonymousYWL/LibShalom
Language:C24 1 610
wanxinhang/Awesome-Semi-supervised-Multi-view-classification
Awesome Semi-supervised Multi-view Classification is a collection of SOTA, novel semi-supervised multi-view classification methods (papers, codes).
Language:Python12 2 02
nDIRECT/nDIRECT
A direct convolution library targeting ARM multi-core CPUs.
Language:C7 1 0
FuncJ/MeAtten
The repository maintains the source code for the article titled "Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUs."
Language:Makefile4 1 01
xrq-phys/blis
Enhanced Arm support for BLIS: Packing and skinny-GEMM.
Language:C2 1 0

FuncJ

FuncJ's Stars

pytorch/pytorch

labmlai/annotated_deep_learning_paper_implementations

xx025/carrot

Dao-AILab/flash-attention

facebookresearch/xformers

NVIDIA/apex

aladdinpersson/Machine-Learning-Collection

harvardnlp/annotated-transformer

oneapi-src/oneDNN

ARM-software/ComputeLibrary

mpitutorial/mpitutorial

Tony-Tan/CUDA_Freshman

google/XNNPACK

Tencent/TurboTransformers

BrightXiaoHan/CMakeTutorial

godweiyang/NN-CUDA-Example

HuangOwen/Awesome-LLM-Compression

libxsmm/libxsmm

tpoisonooo/how-to-optimize-gemm

bytedance/ByteTransformer

KnowingNothing/MatmulTutorial

galeselee/Awesome_LLM_System-PaperList

yzhaiustc/Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F

zhirui-gao/Deep-Template-Matching

SJTU-ReArch-Group/Paper-Reading-List

AnonymousYWL/LibShalom

wanxinhang/Awesome-Semi-supervised-Multi-view-classification

nDIRECT/nDIRECT

FuncJ/MeAtten

xrq-phys/blis