Pinned Repositories
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
bigscience
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
Deep-Approximate-Shapley-Propagation
This is a Pytorch Implementation of the DASP algorithm from the paper "Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Value Approximation"
examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
flash-attention
Fast and memory-efficient exact attention
google-research
Google Research forked
longhorn
Official PyTorch Implementation of the Longhorn Deep State Space Model
longhorn_cuda
CUDA kernels for the Longhorn Architecture
Megatron-LM
Ongoing research training transformer models at scale
OmegaFold
OmegaFold Release Code
RuiWang1998's Repositories
RuiWang1998/Deep-Approximate-Shapley-Propagation
This is a Pytorch Implementation of the DASP algorithm from the paper "Explaining Deep Neural Networks with a Polynomial Time Algorithm for Shapley Value Approximation"
RuiWang1998/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
RuiWang1998/bigscience
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
RuiWang1998/examples
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
RuiWang1998/flash-attention
Fast and memory-efficient exact attention
RuiWang1998/google-research
Google Research forked
RuiWang1998/longhorn
Official PyTorch Implementation of the Longhorn Deep State Space Model
RuiWang1998/longhorn_cuda
CUDA kernels for the Longhorn Architecture
RuiWang1998/Megatron-LM
Ongoing research training transformer models at scale
RuiWang1998/OmegaFold
OmegaFold Release Code
RuiWang1998/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.