sryap

Pinned Repositories

apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Language:Python0 0 00
FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Language:C++0 0 00
flash-attention
Fast and memory-efficient exact attention
Language:Python01
hcc
HCC is an Open Source, Optimizing C++ Compiler for Heterogeneous Compute currently for the ROCm GPU Computing Platform
Language:C++0 1 00
HIP
HIP : Convert CUDA to Portable C++ Code
Language:C++0 2 00
param
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.
Language:Python00
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:C++0 0 00
rocm-timeline-generator
Language:Shell1 1 05
tensorflow
Computation using data flow graphs for scalable machine learning
Language:C++0 1 00
TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++00

sryap's Repositories

sryap/rocm-timeline-generator
Language:Shell1 1 05
sryap/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Language:Python0 0 00
sryap/FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Language:C++0 0 00
sryap/flash-attention
Fast and memory-efficient exact attention
Language:Python01
sryap/hcc
HCC is an Open Source, Optimizing C++ Compiler for Heterogeneous Compute currently for the ROCm GPU Computing Platform
Language:C++0 1 00
sryap/HIP
HIP : Convert CUDA to Portable C++ Code
Language:C++0 2 00
sryap/param
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.
Language:Python00
sryap/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:C++0 0 00
sryap/tensorflow
Computation using data flow graphs for scalable machine learning
Language:C++0 1 00
sryap/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++00
sryap/torcheval
A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to facilitate metric computation in distributed training and tools for PyTorch model evaluations.
Language:Python0 0 00
sryap/torchrec
Pytorch domain library for recommendation systems