JobSetTemplate API
ahg-g opened this issue · 10 comments
What would you like to be added:
A JobSetTemplate API similar to PodTemplate.
Why is this needed:
APIs building on top of JobSet requires referencing a JobSet spec. The common approach is to embed that JobSet spec inside the higher level API, which makes it hard to validate, the other approach is to reference a template.
/feature
/kind feature
Hello, I want to share some simple ideas, I don’t know if they are what we need.
apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSetTemplate
metadata:
name: my-jobset-template
spec:
failurePolicy:
maxRestarts: 3
replicatedJobs:
- name: workers
replicas: 1
template:
spec:
backoffLimit: 0
completions: 2
parallelism: 2
template:
spec:
containers:
- name: worker
image: bash:latest
command:
- bash
- -xc
- |
sleep 1000
apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
name: my-jobset
spec:
templateRef:
name: my-jobset-template
apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSet
metadata:
name: paralleljobs
spec:
replicatedJobs:
- name: workers
templateRef: my-jobset-template
- name: driver
templateRef: my-jobset-template
---
apiVersion: jobset.x-k8s.io/v1alpha2
kind: JobSetTemplate
metadata:
name: my-jobset-template
spec:
replicas: 3
template:
spec:
parallelism: 1
completions: 1
backoffLimit: 0
template:
spec:
containers:
- name: sleep
image: busybox
command:
- sleep
args:
- 100s
If this approach is correct, perhaps we need another CR object and a controller to manage it.
I'm sorry if I misunderstood. Please forgive me if I got it wrong.
@ahg-g @danielvegamyhre @kannon92 Could you please check if this is the way I understand it? If so, I will take it when I have time and write a kep design document
I’d look at how CronJob uses JobTemplates or even how JobSet uses a JobTemplate.
A user should create a jobset without using the templates.
TrainJob could specify a template and that template would be used to create a Jobset. I think that’s the flow.
Generally the templates are used if someone wants to compose the object.
I’d look at how CronJob uses JobTemplates or even how JobSet uses a JobTemplate.
A user should create a jobset without using the templates.
TrainJob could specify a template and that template would be used to create a Jobset. I think that’s the flow.
Generally the templates are used if someone wants to compose the object.
Perhaps we can create a JobSetTemplateController to manage objects like JobSetTemplate. JobSetTemplate is template metadata. JobSet objects can reference this object. But I'm not sure if this is a good design
According to this proposal: kubeflow/training-operator#2171, we are planning to create TrainingRuntime
and ClusterTrainingRuntime
to represent blueprints for various ML training or HPC configurations.
For LLMs runtimes, we will support list of different templates to fine-tune open-source foundational models.
Since we directly using JobSet
API in the TrainingRuntime
, I am wondering do we still need JobSetTemplates ?
According to this proposal: kubeflow/training-operator#2171, we are planning to create
TrainingRuntime
andClusterTrainingRuntime
to represent blueprints for various ML training or HPC configurations. For LLMs runtimes, we will support list of different templates to fine-tune open-source foundational models.Since we directly using
JobSet
API in theTrainingRuntime
, I am wondering do we still need JobSetTemplates ?
As my understanding, @ahg-g mentioned that he wants to try supporting this JobSetTemplate feature regardless of TrainigOperator v2.
According to this proposal: kubeflow/training-operator#2171, we are planning to create
TrainingRuntime
andClusterTrainingRuntime
to represent blueprints for various ML training or HPC configurations. For LLMs runtimes, we will support list of different templates to fine-tune open-source foundational models.
Since we directly usingJobSet
API in theTrainingRuntime
, I am wondering do we still need JobSetTemplates ?As my understanding, @ahg-g mentioned that he wants to try supporting this JobSetTemplate feature regardless of TrainigOperator v2.
Yes, we have another use case where JobSetTemplate would be useful - I can't elaborate much further right now since it isn't public yet, but there are definitely other use cases :)
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
- After 90d of inactivity,
lifecycle/stale
is applied - After 30d of inactivity since
lifecycle/stale
was applied,lifecycle/rotten
is applied - After 30d of inactivity since
lifecycle/rotten
was applied, the issue is closed
You can:
- Mark this issue as fresh with
/remove-lifecycle stale
- Close this issue with
/close
- Offer to help out with Issue Triage
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale