Add CLI support for getting list of `submissionIds`
Opened this issue · 0 comments
At the moment, the DCOS Spark CLI (as described here) allows you to:
- get status for a given Spark driver, via
dcos spark status <submissionId>
- kill a given Spark driver, via
dcos spark kill <submissionId>
.
But in order to know your submissionId
, you currently have to pay attention when it's returned as a result from the original dcos spark run
command.
Our use case is for long running Spark drivers; if we update our code for a given driver, we want our CI/CD pipeline to replace the existing Spark driver with the newly built one. It's trivial to automate starting the newly built Spark driver (just run dcos spark run
) but killing the old one is more complex.
Given we don't want our CI/CD pipeline to maintain state of what the last submissionId
was for a given driver, we need some mechanism to inspecting what drivers are currently running in Mesosphere Spark, and identify any existing submissions which match a specified name (as per whatever name was provided in the original dcos spark run
command).
I realise there is no Spark mechanism to provide this information (I see from browsing the DCOS Spark CLI code that the commands described above just delegate to spark-submit --status
and spark-submit --kill
.
However, having inspected the data in Zookeeper (both using Exhibitor and also zkCli.sh
on a master node) it's clear that Spark stores the information we need in /spark_mesos_dispatcher/launchedDrivers
, with each driver an entity in that path containing a binary MesosClusterSubmissionState
. To explore this yourself on the Zookeeper CLI:
ls /spark_mesos_dispatcher/launchedDrivers
to get a list of submission IDsget /spark_mesos_dispatcher/launchedDrivers/<driverId>
to print a string representation of theMesosClusterSubmissionState
object for that submission.
At this point I see us having three options:
- Work with Apache Spark to add support in
spark-submit
for this, and then work with this project to expose that functionality. - Work with this project to directly add support for a
dcos spark list
or similar command to list running drivers, which under the covers hits Zookeeper directly. - Build our own mechanism - probably a simple service running in a Docker container inside Mesos - which inspects information in Zookeeper and serves it via a REST API, which we can then expose outside the cluster.
Given the chain of dependencies I suspect that option (1) - if it is in fact considered acceptable by all parties - would take forever before it would be available to us in our Mesosphere cluster; therefore I thought I'd explore the possibility of option (2) with yourselves - hence raising this issue! - before we resort to option (3).
Looking forward to your thoughts.