Pinned Repositories
activity_trace_async
agg
Testbed for CUDA kernel aggregation
amr
Integration of GPU solvers in Charm++ AMR MiniApp
baseenv
A fork of Bill Gropp's baseenv (http://wgropp.cs.illinois.edu/projects/software/baseenv.htm)
charm
The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
charming
GPU-resident runtime system based on Charm++ principles
starter-academic
apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
minitu's Repositories
minitu/starter-academic
minitu/baseenv
A fork of Bill Gropp's baseenv (http://wgropp.cs.illinois.edu/projects/software/baseenv.htm)
minitu/charm
The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
minitu/charming
GPU-resident runtime system based on Charm++ principles
minitu/hpm
A Heterogeneous Performance Modeling Framework (GPU + MPI)
minitu/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
minitu/buggy
A buddy allocator for GPU memory
minitu/changa
Mirror of UIUC/PPL version of ChaNGa
minitu/codes
The Co-Design of Exascale Storage Architectures (CODES) simulation framework builds upon the ROSS parallel discrete event simulation engine to provide high-performance simulation utilities and models for building scalable distributed systems simulations
minitu/dlrm
An implementation of a deep learning recommendation model (DLRM)
minitu/dumpi-cortex
A fork of https://xgitlab.cels.anl.gov/mdorier/dumpi-cortex
minitu/gerrit2github
minitu/gpu
Contains pieces of GPU related research that are too small to warrant a separate repository.
minitu/gpuroofperf-toolkit
A GPU performance prediction toolkit for CUDA programs
minitu/jacobi2d
minitu/kokkos-tutorials
Tutorials for the Kokkos C++ Performance Portability Programming EcoSystem
minitu/Megatron-LM
Ongoing research training transformer models at scale
minitu/miniFE
MiniFE Finite Element Mini-Application
minitu/miniMD
MiniMD Molecular Dynamics Mini-App
minitu/mpitest
minitu/multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
minitu/NeMo
NeMo: a toolkit for conversational AI
minitu/ompi
Open MPI main development repository
minitu/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
minitu/sst-dumpi
SST DUMPI Trace Library
minitu/sw4lite
Testing numerical kernels in SW4
minitu/TraceR
Trace Replay and Network Simulation Framework
minitu/training
Reference implementations of MLPerf™ training benchmarks
minitu/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
minitu/triton
Development repository for the Triton language and compiler