kubeflow/pytorch-operator

Worker template should be configurable.

MartinForReal opened this issue · 1 comments

The QoS of worker pod created by operator is Burstable due to the resource config here:

var initContainerTemplate = `
- name: init-pytorch
image: {{.InitContainerImage}}
imagePullPolicy: IfNotPresent
resources:
limits:
cpu: 100m
memory: 20Mi
requests:
cpu: 50m
memory: 10Mi
command: ['sh', '-c', 'until nslookup {{.MasterAddr}}; do echo waiting for master; sleep 2; done;']`

This is a vital issue because only the pods with Guaranteed class can be processed by cpumanager.

We could make them guaranteed or take these values as args