mhuguesaws's Stars
stas00/ml-engineering
Machine Learning Engineering Open Book
tanelpoder/0xtools
0x.Tools: X-Ray vision for Linux systems
aws-samples/awsome-distributed-training
Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.
opencomputeproject/ocp-diag-pcicrawler
pcicrawler is a Python based command line interface tool which can be used to display, filter and export information about PCI (Peripheral Component Interconnect) or PCIe buses and devices, as well as PCI topology.
aws-samples/ec2-topology-aware-for-slurm
apple/axlearn
An Extensible Deep Learning Library
darothen/ai-models-for-all
Run AI NWP forecasts hassle-free, serverless in the cloud!
awslabs/amazon-ebs-autoscale
Don't run out of disk space on your EC2 instance when generating or working with large files. Automatically add EBS volumes to a filesystem mount point in response to disk utilization.
aws-solutions-library-samples/guidance-for-building-a-high-performance-numerical-weather-prediction-system-on-aws
HPC on AWS removes the long wait times and lost productivity often associated with on-premises HPC clusters. Flexible HPC cluster configurations and virtually unlimited scalability allows you to grow and shrink your infrastructure as your workloads dictate, not the other way around
mil-ad/stui
A Slurm dashboard for the terminal.
aws-samples/aws-batch-blueprints
dirkpetersen/hpc-containers
You should offer both Podman and Apptainer with name spaces on your HPC systems
aws-samples/aws-batch-operational-dashboards
NVIDIA/GPUPowerTest
A utility for stressing GPUs by driving utilization (and thus power consumption) up and down in user-defined cycle intervals. It will also randomly drop power consumption down to idle and spike it back up
NVIDIA/GPUStressTest
GPU Stress Test is a tool to stress the compute engine of NVIDIA Tesla GPU’s by running a BLAS matrix multiply using different data types. It can be compiled and run on both Linux and Windows.
aws-solutions-library-samples/guidance-for-ec2-spot-placement-score-tracker
This Guidance shows how to build an Amazon Elastic Compute Cloud (Amazon EC2) Spot placement score tracker to monitor unused Amazon EC2 Spot Instance capacity.
aws/aws-graviton-getting-started
Helping developers to use AWS Graviton2, Graviton3, and Graviton4 processors which power the 6th, 7th, and 8th generation of Amazon EC2 instances (C6g[d], M6g[d], R6g[d], T4g, X2gd, C6gn, I4g, Im4gn, Is4gen, G5g, C7g[d][n], M7g[d], R7g[d], R8g).
aws-samples/aws-parallelcluster-post-install-scripts
Scripts to customize AWS ParallelCluster
aws-samples/aws-parallelcluster-hpc-quickstart
AdRoll/batchiepatchie
ROCm/aws-ofi-rccl
awslabs/ec2-spot-labs
Collection of tools and code examples to demonstrate best practices in using Amazon EC2 Spot Instances.
aws-samples/aws-decoupled-serverless-scheduler
AWS Decoupled Serverless Scheduler
awslabs/aws-cyclone-solution
aws/karpenter-provider-aws
Karpenter is a Kubernetes Node Autoscaler built for flexibility, performance, and simplicity.
aws-samples/event-driven-weather-forecasts
aws/aws-parallelcluster
AWS ParallelCluster is an AWS supported Open Source cluster management tool to deploy and manage HPC clusters in the AWS cloud.
sfiligoi/tutorial-k8s-sdsc2022
Kubernetes Tutorial - Originally presented as a SDSC event in 2022
openshift-psap/ci-artifacts
OpenShift PSAP-team CI Artifacts
kwozyman/ocp-aws-efa-poc
Proof of concept for Openshift running compute workloads on top of AWS EFA