apache-spark-on-k8s/spark

[Question] How to add customized pod to the spark workers?

leletan opened this issue · 8 comments

In our stand alone clusters, we installed datadog agent onto every each worker node for reliable stats collection.
Wondering if we can do something similar to spark on kube.

I think you would need to build and use custom driver and executor images that have datadog installed. See https://github.com/apache-spark-on-k8s/userdocs/blob/master/src/jekyll/running-on-kubernetes.md#docker-images.

Yah, thought about that.
But then we will have multiple processes running in one docker, which is not idiomatic.
Not sure if there is any other workaround.

I'm not familiar with datadog, can it run as a sidecar container in the same pod?

Yah, running as a sidecar container in the same pod would be ideal.
We are using a slightly customized version to https://github.com/DataDog/docker-dd-agent

Spark on Kubernetes currently does not support sidecar containers yet. But I think this is a use case that https://github.com/liyinan926/spark-operator can support by injecting this sidecar container into the driver and executor pods through the initializer. Is there any configuration (e.g., environment variables) need to be done to the container?

Yah, we will need to set a couple of them: API_KEY, HOSTNAME, TAG, etc

Sidecar containers should be possible to inject through webhook initializers in K8s 1.9. If you're on an older version and don't have access to k8s alpha features (pod presets or initializers), there's no easy way to accomplish this yet. Agreed with @liyinan926 that this is a good fit for the spark-operator use-case.

Cool. Thanks for the answers, guys.