rajesh-s
Contributions from my personal addresses are my own and not on behalf of my employer.
University of Wisconsin–MadisonMadison, WI
rajesh-s's Stars
theicfire/makefiletutorial
Learn make by example
zwang4/awesome-machine-learning-in-compilers
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
nviennot/core-to-core-latency
Measures the latency between CPU cores
aws/aws-graviton-getting-started
Helping developers to use AWS Graviton2, Graviton3, and Graviton4 processors which power the 6th, 7th, and 8th generation of Amazon EC2 instances (C6g[d], M6g[d], R6g[d], T4g, X2gd, C6gn, I4g, Im4gn, Is4gen, G5g, C7g[d][n], M7g[d], R7g[d], R8g).
pytorch/kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
ROCm/HIPIFY
HIPIFY: Convert CUDA to Portable C++ Code
Xilinx/Vitis_Accel_Examples
Vitis_Accel_Examples
johnmarktaylor91/torchlens
Package for extracting and mapping the results of every single tensor operation in a PyTorch model in one line of code.
microsoft/antares
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
jeffhammond/STREAM
STREAM benchmark
MIPT-ILab/mipt-mips
Cycle-accurate pre-silicon simulator of RISC-V and MIPS CPUs
facebookresearch/HolisticTraceAnalysis
A library to analyze PyTorch traces.
intel/systemc-compiler
This tool translates synthesizable SystemC code to synthesizable SystemVerilog.
bespoke-silicon-group/bsg_manycore
Tile based architecture designed for computing efficiency, scalability and generality
gunrock/graphblast
High-Performance Linear Algebra-based Graph Primitives on GPUs
Shenggan/awesome-distributed-ml
A curated list of awesome projects and papers for distributed training or inference
anilshanbhag/gpu-topk
Efficient Top-K implementation on the GPU
mmperf/mmperf
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
facebookresearch/param
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.
icl-utk-edu/papi
google/multichase
chipsalliance/fpga-tool-perf
FPGA tool performance profiling
ARM-software/data
Machine-readable data describing Arm architecture and implementations. Includes JSON descriptions of implemented PMU events.
GeorgeBird1/Diagramatic-Neural-Networks
rajesh-s/mlsys-allreduce
All reduce implementation and analysis of BDE vs Ring primitives
rajesh-s/cs639-parallel-throughput-opt-prog
Parallel and Throughput-Optimized Programming by Prof. Sifakis
lauxy/GPGPU-Roofline-Chart-Plotting
This is an automated plotting tools which can combine amount of GPGPU kernels into a single roofline chart.
rajesh-s/axle-zsim-nvmain
This repository is intended to be a fork of AXLE zsim-nvmain with modifications necessary to run make things work with dependencies in 2021
rajesh-s/cs757-gpu-tlb-prefetcher
rajesh-s/mlsys-gpu-power-variability
Power Variability project with Prof. Shivaram Venkataraman and Prof. Matthew Sinclair