Pinned Repositories
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
DeepSpeedExamples
Example models using DeepSpeed
haq
[CVPR 2019, Oral] HAQ: Hardware-Aware Automated Quantization with Mixed Precision
stl1weekend
Build your own STL in one weekend
PDEBench
PDEBench: An Extensive Benchmark for Scientific Machine Learning
aima-python
Python implementation of algorithms from Russell And Norvig's "Artificial Intelligence - A Modern Approach"
annotated_deep_learning_paper_implementations
๐งโ๐ซ 59 Implementations/tutorials of deep learning papers with side-by-side notes ๐; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), ๐ฎ reinforcement learning (ppo, dqn), capsnet, distillation, ... ๐ง
cumf_sgd
CUDA Matrix Factorization Library with Stochastic Gradient Descent (SGD)
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
TurboBFS
A highly scalable GPU-based set of top-down and bottom-up BFS algorithms in the language ofย linear algebra.
qwerfdsadad's Repositories
qwerfdsadad/aima-python
Python implementation of algorithms from Russell And Norvig's "Artificial Intelligence - A Modern Approach"
qwerfdsadad/annotated_deep_learning_paper_implementations
๐งโ๐ซ 59 Implementations/tutorials of deep learning papers with side-by-side notes ๐; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, ...), gans(cyclegan, stylegan2, ...), ๐ฎ reinforcement learning (ppo, dqn), capsnet, distillation, ... ๐ง
qwerfdsadad/cumf_sgd
CUDA Matrix Factorization Library with Stochastic Gradient Descent (SGD)
qwerfdsadad/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
qwerfdsadad/TurboBFS
A highly scalable GPU-based set of top-down and bottom-up BFS algorithms in the language ofย linear algebra.