Pinned Repositories
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
flash-attention
Fast and memory-efficient exact attention
hcc
HCC is an Open Source, Optimizing C++ Compiler for Heterogeneous Compute currently for the ROCm GPU Computing Platform
HIP
HIP : Convert CUDA to Portable C++ Code
param
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
rocm-timeline-generator
tensorflow
Computation using data flow graphs for scalable machine learning
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
sryap's Repositories
sryap/rocm-timeline-generator
sryap/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
sryap/FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
sryap/flash-attention
Fast and memory-efficient exact attention
sryap/hcc
HCC is an Open Source, Optimizing C++ Compiler for Heterogeneous Compute currently for the ROCm GPU Computing Platform
sryap/HIP
HIP : Convert CUDA to Portable C++ Code
sryap/param
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.
sryap/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
sryap/tensorflow
Computation using data flow graphs for scalable machine learning
sryap/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
sryap/torcheval
A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to facilitate metric computation in distributed training and tools for PyTorch model evaluations.
sryap/torchrec
Pytorch domain library for recommendation systems