This repository hosts the Kubernetes Training Operator for Kubeflow training jobs.
The Kubeflow Training Operator provides Kubernetes custom resources to run distributed or non-distributed training jobs, such as TFJobs and PytorchJobs. The Training Operator in this repository is a Python script which wraps the latest released Kubeflow Training Operator manifests, providing lifecycle management and handling events (install, upgrade, integrate, remove). It is one of the Charmed Kubeflow operators.
While it is possible to deploy the Training Operator as a standalone operator, it works best when deployed alongside other components included in the Kubeflow bundle. For installation steps, please refer to the installation guide.