AI-Hypercomputer/xpk
xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerators such as TPUs and GPUs on GKE.
PythonApache-2.0
Issues
- 3
- 0
- 0
update version on pypi?
#285 opened by GallagherCommaJack - 0
installation broken after updating main branch
#284 opened by samos123 - 0
- 0
Create integration test using Kind
#267 opened by IrvingMg - 3
Consider configuring kueue waitForPodsReady
#191 opened by avrittrohwer - 0
- 1
- 1
Create cluster from several reservations
#161 opened by DwarKapex - 11
xpk Cluster Queue resource group "cpu" resource quota incorrect for a CPU-only cluster
#158 opened by bernardhan33 - 1
Set cluster versions by default to be based on the recommended rapid channel version
#40 opened by Obliviour - 1
Clear up xpk backward compatibility story
#19 opened by Obliviour - 1
Create Github runner to execute xpk on vm.
#23 opened by Obliviour - 3
Make XPK Handle multiple slice sizes
#17 opened by rwitten - 1
- 1
Make sure the user doesn't have the region set using `gcloud config set region` without having a zone set or a zone flag.
#24 opened by Obliviour - 0
Add xpk workload label in jobset api
#43 opened by Obliviour - 1
Wrong output value of TPUVMs when the cluster have the string tpu in its name
#14 opened by williampispico - 0
Create github pip release workflow
#42 opened by Obliviour - 0
MaxText/XPK story is kind of intertwined
#10 opened by rwitten - 0
Graceful Error Message on Failing To Apply Kueue
#12 opened by rwitten - 0
Improve CPU Node Pool Defaults
#30 opened by Obliviour - 0
support positional arguments?
#31 opened by GallagherCommaJack - 0
Set reservation to default capacity type and add Reservation-affinity flag to help users set their reservation type.
#22 opened by Obliviour - 0
Improve Jobset error messaging to clearly list solution around Kubernetes Permissions
#20 opened by Obliviour - 1
Make XPK pip3 installable
#16 opened by rwitten - 0
List of Nits From An Early User
#15 opened by rwitten - 0
Marrying User Metadata Into Cluster
#13 opened by rwitten - 1
Error applying Kueue CRDs on MacOS
#9 opened by danielvegamyhre