GoogleCloudPlatform/hpc-toolkit
Cloud HPC Toolkit is an open-source software offered by Google Cloud which makes it easy for customers to deploy HPC environments on Google Cloud.
HCLApache-2.0
Issues
- 3
Using a newer version of Terraform can lead to controller replacement on reconfigure for Slurm GCP v6
#2774 opened by nick-stroud - 1
The apptainer example fails to deploy because it is using the `slurm-gcp-6-4-hpc-rocky-linux-8` `source_image_family`
#2802 opened by mr0re1 - 5
- 5
Rocky image failing due to 404 on lustre-client
#2733 opened by javierbq - 31
How to use image-builder.yaml to install a docker image to template VM
#1598 opened by noahharrison64 - 2
No CUDA devices visible with A2 instances
#2634 opened by msis - 4
Fail to consume shared reservations
#2548 opened by casassg - 16
PMIx MPI support in Slurm
#2274 opened by tpdownes - 6
Upgrade to Ops Agent fails
#2487 opened by Tristan-Kosciuch - 8
HTCondor tutorial: add cloudresourcemanager.googleapis.com to the list of services to enable
#2496 opened by katilp - 4
- 3
- 2
IP space of [gcp project subnet] is exhausted when deploying a GCP Slurm cluster
#2389 opened by fdmalone - 1
Broken link
#2261 opened by prashantkul - 2
Example of startup script with cluster without vm-instance?
#2202 opened by vsoch - 4
- 0
- 5
- 2
error when use packer to build image in ml-slurm
#1832 opened by higuhigu-lb - 2
- 2
- 1
- 4
- 40
SLURM 1.20 deployed and having node creation error
#1600 opened by sharif-cameco - 7
- 6
HPC toolkit no longer works with a2 instances
#1664 opened by cbraynor - 12
Cannot create worker node
#1581 opened by sharif-cameco - 3
Creating router and NAT in pre existing vpc
#1590 opened by sharif-cameco - 2
- 1
Give a short summary of changes on ghpc deploy/destroy
#1556 opened by yaroslavvb - 3
Adding partition causes the entire cluster to fail due to failures in `/slurm/scripts/setup.py`
#1554 opened by yaroslavvb - 4
ghpc deploy ends up in bad state when instance creation fails due to transient problem
#1536 opened by yaroslavvb - 2
- 3
User management best practices/examples
#1458 opened by jtrmal - 4
NFS server file system bug
#1388 opened by maxveliaminov - 2
Chrome-remote-desktop support
#1405 opened by rgclapp007 - 1
MPI arguments for MPI jobs
#1386 opened by ZLPG23 - 8
- 0
VM name needs updated
#1303 opened by jrossthomson - 5
- 6
TKFE Deployed Cluster fails to initialize slurm
#1185 opened by matthewc2003 - 4
Creating Filestore via Front End is broken due to missing required setting (network_id)
#1096 opened by saltysoup - 7
- 0
Packer custom image is missing storage-location option.
#1121 opened by ek-nag - 4
- 2
OFE deployment fails when deploying from macOS.
#978 opened by ek-nag - 2
Missing icons in HPCTKFE
#953 opened by fkc1e100 - 2
SchedMD slurm image does not exist
#912 opened by Tristan-Kosciuch - 2
- 5
Failed to download DDN exascaler module
#860 opened by Tristan-Kosciuch