Arsmart123's Stars
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
codertimo/BERT-pytorch
Google AI 2018 BERT pytorch implementation
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
MegEngine/MegEngine
MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架
Tony-Tan/CUDA_Freshman
brucefan1983/CUDA-Programming
Sample codes for my CUDA programming book
NVIDIA-developer-blog/code-samples
Source code examples from the Parallel Forall Blog
pytorch/extension-cpp
C++ extensions in PyTorch
NervanaSystems/maxas
Assembler for NVIDIA Maxwell architecture
tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
longcw/RoIAlign.pytorch
RoIAlign & crop_and_resize for PyTorch
JeanKossaifi/tensorly-notebooks
Tensor methods in Python with TensorLy
Cjkkkk/CUDA_gemm
A simple high performance CUDA GEMM implementation.
Yinghan-Li/YHs_Sample
Yinghan's Code Sample
oseledets/ttpy
Python implementation of the TT-Toolbox
YouQixiaowu/CUDA-Programming-with-Python
关于书籍CUDA Programming使用了pycuda模块的Python版本的示例代码
wangzyon/NVIDIA_SGEMM_PRACTICE
Step-by-step optimization of CUDA SGEMM
sniklaus/pytorch-extension
an example of a CUDA extension for PyTorch using CuPy which computes the Hadamard product of two tensors
lzhengchun/matrix-cuda
matrix multiplication in CUDA
yangyubuaa/cuda_accelerate
使用c++以及cuda加速神经网络样例(实现矩阵加法和矩阵乘法)
weifengliu-ssslab/Benchmark_SpGEMM_using_CSR
CSR-based SpGEMM on nVidia and AMD GPUs
LLNL/acrotensor
A C++ library for computing large scale tensor contractions.
Huanghongru/SGEMM-Implementation-and-Optimization
:pencil: Some source code about matrix multiplication implementation on CUDA
HUI11126/Compute-continuous-moments-de-ned-in-a-rectangular-region-using-CUDA-and-some-applications
zpzim/MSplitGEMM
Large matrix multiplication in CUDA
colehawkins/bayesian-tensor-rank-determination
asrivast28/ParsiMoNe
Parallel Construction of Module Networks
mnrn/optimizing-matrix-multiplication-examples
Here's optimizing matrix multiplication examples.
HappyPointer/IT5007_Project_Spark-Tok
This is the repository containing souce code of our IT5007 Project - Spark Tok