Pinned Repositories
accfft
A Massively Parallel FFT Library for CPU/GPU
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
AutoTest
awesome-machine-learning-cn
机器学习资源大全中文版,包括机器学习领域的框架、库以及软件
baidu-allreduce
EZLippi.github.io
这是我的个人网站的源码,欢迎fork。
fastmoe
A fast MoE impl for PyTorch
how-to-optimize-gemm
taco
The Tensor Algebra Compiler (taco) computes tensor expressions on sparse and dense tensors
limin2021's Repositories
limin2021/fastmoe
A fast MoE impl for PyTorch
limin2021/AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
limin2021/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
limin2021/AutoTest
limin2021/CINN
a Compiler Infrastructure for Neural Networks
limin2021/CLBlast
Tuned OpenCL BLAS
limin2021/composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
limin2021/cuda-graph-test
limin2021/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
limin2021/CUDALibrarySamples
CUDA Library Samples
limin2021/cuGemmProf
A simple tool to profile performance of multiple combinations of GEMM of cuBLAS
limin2021/dgSPARSE-Library
limin2021/docs
Documentations for PaddlePaddle
limin2021/download_google_drive
Download files from Google Drive using Python 2 or Python 3
limin2021/fcc
Fiuggi Compiler Collection (FCC) is a high-performance compiler based on LLVM.
limin2021/flash-attention
Fast and memory-efficient exact attention
limin2021/gcnLib
limin2021/ge-spmm
limin2021/INFINITY
limin2021/keras-xception
limin2021/models
Pre-trained and Reproduced Deep Learning Models (『飞桨』官方模型库,包含多种学术前沿和工业场景验证的深度学习模型)
limin2021/Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
limin2021/PaddleNLP
An NLP library with Awesome pre-trained Transformer models and easy-to-use interface, supporting wide-range of NLP tasks from research to industrial applications.
limin2021/PaddleScience
PaddleScience is SDK and library for developing AI-driven scientific computing applications based on PaddlePaddle.
limin2021/PASSL
PASSL包含 SimCLR,MoCo v1/v2,BYOL,CLIP,PixPro,simsiam, SwAV, BEiT,MAE 等图像自监督算法以及 Vision Transformer,DEiT,Swin Transformer,CvT,T2T-ViT,MLP-Mixer,XCiT,ConvNeXt,PVTv2 等基础视觉算法
limin2021/pytorch_sparse
PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations
limin2021/sparse_transformer_sc21
limin2021/training
Reference implementations of MLPerf™ training benchmarks
limin2021/training_results_v1.0
limin2021/vectorSparse