salve nodes not started on re-start
itsmeccr opened this issue · 2 comments
itsmeccr commented
I launched a cluster with 2 slave nodes. I ran spark-ec2 stop cluster_name
command which stopped master and terminated the spot slave instances.
Now, I again tried to restart the cluster but got following error.
Found 1 master, 0 slaves.
Starting slaves...
Starting master...
Waiting for cluster to enter 'ssh-ready' state..........
Cluster is now in 'ssh-ready' state. Waited 241 seconds.
Traceback (most recent call last):
File "./spark_ec2.py", line 1528, in <module>
main()
File "./spark_ec2.py", line 1520, in main
real_main()
File "./spark_ec2.py", line 1503, in real_main
existing_slave_type = slave_nodes[0].instance_type
IndexError: list index out of range
What is causing this and what is the solution?
shivaram commented
We dont restart slave nodes in the case of spot instances. That is only supported for on-demand instances that have been stopped. You can use the flag --use-existing-master
in launch
and give the same cluster name. That will re-bid for spot instance slaves and then connect them to the stopped master
itsmeccr commented
Thank you.