Pinned Repositories
GSparsity
xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
DCGM
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
BSCA
Matlab code for block successive convex approximation algorithms
proxsgd
ProxSGD algorithm in TensorFlow
STELA
STELA algorithm for sparsity regularized linear regression (LASSO)
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
optyang's Repositories
optyang/BSCA
Matlab code for block successive convex approximation algorithms
optyang/STELA
STELA algorithm for sparsity regularized linear regression (LASSO)
optyang/proxsgd
ProxSGD algorithm in TensorFlow
optyang/AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
optyang/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
optyang/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.