opensearch-project/opensearch-k8s-operator

[FEATURE] Allow overriding image spec for node group

Opened this issue · 1 comments

Is your feature request related to a problem?

We’re running an OpenSearch cluster with dedicated ML nodes supported by CUDA. These nodes and run with GPU-enabled container runtimes and require a specific image with CUDA built-in, while other nodes should use the original ones that are smaller and runtime-agnostic.

What solution would you like?

Add an optional image property to node spec that overrides the Pod spec template of the resulting StatefulSet.

What alternatives have you considered?

If all ML nodes are CUDA-enabled, we can endure the larger images and just use the CUDA version for all nodes.

Do you have any additional context?

To use CUDA with specific runtime, we also need the ability to set the runtimeClassName property in the Pod spec. This should be another small feature request.

[Triage]
Hey @stevapple as of today custom image at nodePool level spec.nodePools[0].image is not supported in the NodePool struct. Also just curious today CUDA built-in OpenSearch images are not officially release by the project (coming from the issue opensearch-project/opensearch-build#4743 you created :) ), do you have a built in custom image for this purpose?
Also if you are open can you please contribute to the feature to allow overriding image spec for node group ?

Thank you
@getsaurabh02 @swoehrl-mw @rishabh6788 @peterzhuamazon