FuncJ's Stars
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
labmlai/annotated_deep_learning_paper_implementations
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
xx025/carrot
Free ChatGPT Site List 这儿为你准备了众多免费好用的ChatGPT镜像站点
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
facebookresearch/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
NVIDIA/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
aladdinpersson/Machine-Learning-Collection
A resource for learning about Machine learning & Deep Learning
harvardnlp/annotated-transformer
An annotated implementation of the Transformer paper.
oneapi-src/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
ARM-software/ComputeLibrary
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologies.
mpitutorial/mpitutorial
MPI programming lessons in C and executable code examples
Tony-Tan/CUDA_Freshman
google/XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Tencent/TurboTransformers
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
BrightXiaoHan/CMakeTutorial
CMake中文实战教程
godweiyang/NN-CUDA-Example
Several simple examples for popular neural network toolkits calling custom CUDA operators.
HuangOwen/Awesome-LLM-Compression
Awesome LLM compression research papers and tools.
libxsmm/libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
bytedance/ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
KnowingNothing/MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
galeselee/Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!
yzhaiustc/Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually, even under multithreading.
zhirui-gao/Deep-Template-Matching
[CVMJ2024] Learning Accurate Template Matching with Differentiable Coarse-to-fine Correspondence Refinement
SJTU-ReArch-Group/Paper-Reading-List
AnonymousYWL/LibShalom
wanxinhang/Awesome-Semi-supervised-Multi-view-classification
Awesome Semi-supervised Multi-view Classification is a collection of SOTA, novel semi-supervised multi-view classification methods (papers, codes).
nDIRECT/nDIRECT
A direct convolution library targeting ARM multi-core CPUs.
FuncJ/MeAtten
The repository maintains the source code for the article titled "Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUs."
xrq-phys/blis
Enhanced Arm support for BLIS: Packing and skinny-GEMM.