amplab/spark-ec2

salve nodes not started on re-start

itsmeccr opened this issue · 2 comments

I launched a cluster with 2 slave nodes. I ran spark-ec2 stop cluster_name command which stopped master and terminated the spot slave instances.
Now, I again tried to restart the cluster but got following error.

Found 1 master, 0 slaves.
Starting slaves...
Starting master...
Waiting for cluster to enter 'ssh-ready' state..........
Cluster is now in 'ssh-ready' state. Waited 241 seconds.
Traceback (most recent call last):
  File "./spark_ec2.py", line 1528, in <module>
    main()
  File "./spark_ec2.py", line 1520, in main
    real_main()
  File "./spark_ec2.py", line 1503, in real_main
    existing_slave_type = slave_nodes[0].instance_type
IndexError: list index out of range

What is causing this and what is the solution?

We dont restart slave nodes in the case of spot instances. That is only supported for on-demand instances that have been stopped. You can use the flag --use-existing-master in launch and give the same cluster name. That will re-bid for spot instance slaves and then connect them to the stopped master

Thank you.