salve nodes not started on re-start

Question

salve nodes not started on re-start

itsmeccr opened this issue 8 years ago · 2 comments

I launched a cluster with 2 slave nodes. I ran spark-ec2 stop cluster_name command which stopped master and terminated the spot slave instances.
Now, I again tried to restart the cluster but got following error.

Found 1 master, 0 slaves.
Starting slaves...
Starting master...
Waiting for cluster to enter 'ssh-ready' state..........
Cluster is now in 'ssh-ready' state. Waited 241 seconds.
Traceback (most recent call last):
  File "./spark_ec2.py", line 1528, in <module>
    main()
  File "./spark_ec2.py", line 1520, in main
    real_main()
  File "./spark_ec2.py", line 1503, in real_main
    existing_slave_type = slave_nodes[0].instance_type
IndexError: list index out of range

What is causing this and what is the solution?

Answer 1 · 2017-03-18T18:06:07.000Z

We dont restart slave nodes in the case of spot instances. That is only supported for on-demand instances that have been stopped. You can use the flag --use-existing-master in launch and give the same cluster name. That will re-bid for spot instance slaves and then connect them to the stopped master

Answer 2 · 2017-03-18T18:38:48.000Z

Thank you.