Exception: Spark health check failed
Fuzzy-sh opened this issue · 9 comments
Flintrock version: 11.0
Python version: 3.5.5 / anaconda
OS: Amazon Linux 2 AMI 2.0.20181114 x86_64 HVM ebs - ami-0b8d0d6ac70e5750c
region of the instance is N. Virginia
Config.yaml
services:
spark:
version: 2.3.3
# git-commit: latest # if not 'latest', provide a full commit SHA; e.g. d6dc12ef0146ae409834c78737c116050961f350
# git-repository: # optional; defaults to https://github.com/apache/spark
# optional; defaults to download from from the official Spark S3 bucket
# - must contain a {v} template corresponding to the version
# - Spark must be pre-built
# - must be a tar.gz file
# download-source: "https://www.example.com/files/spark/{v}/spark-{v}.tar.gz"
download-source: "https://www-us.apache.org/dist/spark/spark-{v}/spark-{v}-bin-hadoop2.7.tgz"
# executor-instances: 1
hdfs:
version: 2.8.5
# optional; defaults to download from a dynamically selected Apache mirror
# - must contain a {v} template corresponding to the version
# - must be a .tar.gz file
# download-source: "https://www.example.com/files/hadoop/{v}/hadoop-{v}.tar.gz"
# download-source: "http://www-us.apache.org/dist/hadoop/common/hadoop-{v}/hadoop-{v}.tar.gz"
provider: ec2
providers:
ec2:
key-name: Key-flint
identity-file: /home/ec2-user/.ssh/Key-flint.pem
instance-type: t2.micro
region: us-east-1
# availability-zone: <name>
ami: ami-0b8d0d6ac70e5750c # Amazon Linux 2, us-east-1
user: ec2-user
# ami: ami-61bbf104 # CentOS 7, us-east-1
# user: centos
# spot-price: <price>
# vpc-id: <id>
# subnet-id: <id>
# placement-group: <name>
# security-groups:
# - group-name1
# - group-name2
# instance-profile-name:
# tags:
# - key1,value1
# - key2, value2 # leading/trailing spaces are trimmed
# - key3, # value will be empty
# min-root-ebs-size-gb: <size-gb>
tenancy: default # default | dedicated
ebs-optimized: no # yes | no
instance-initiated-shutdown-behavior: terminate # terminate | stop
# user-data: /path/to/userdata/script
launch:
num-slaves: 1
# install-hdfs: True
install-spark: True
debug: false
Hello, dear @nchammas
I have got this error
After pressing y
.............................................................
Have seen in the config.yaml that the source should be in .tar.gz format
--> # download-source: "https://www.example.com/files/spark/{v}/spark-{v}.tar.gz"
I could not find any spark-2.3.3.tar.gz
Dear @nchammas
Here is the output when the config.yaml, install-spark is False.
It works fine.
[ec2-user@ip-172-31-39-241 ~]$ flintrock launch flint-test
2019-06-06 04:17:34,407 - flintrock.ec2 - INFO - Launching 2 instances...
2019-06-06 04:17:46,361 - flintrock.ec2 - DEBUG - 2 instances not in state 'running': 'i-0c1a896d36dac451b', 'i-0f615176da68db3bc', ...
2019-06-06 04:17:49,499 - flintrock.ec2 - DEBUG - 2 instances not in state 'running': 'i-0c1a896d36dac451b', 'i-0f615176da68db3bc', ...
2019-06-06 04:17:52,604 - flintrock.ec2 - DEBUG - 2 instances not in state 'running': 'i-0c1a896d36dac451b', 'i-0f615176da68db3bc', ...
2019-06-06 04:17:55,754 - flintrock.ec2 - DEBUG - 2 instances not in state 'running': 'i-0c1a896d36dac451b', 'i-0f615176da68db3bc', ...
2019-06-06 04:17:58,885 - flintrock.ec2 - DEBUG - 2 instances not in state 'running': 'i-0c1a896d36dac451b', 'i-0f615176da68db3bc', ...
2019-06-06 04:18:02,051 - flintrock.ec2 - DEBUG - 1 instances not in state 'running': 'i-0c1a896d36dac451b', ...
2019-06-06 04:18:05,154 - flintrock.ec2 - DEBUG - 1 instances not in state 'running': 'i-0c1a896d36dac451b', ...
2019-06-06 04:18:11,352 - flintrock.ssh - DEBUG - [54.196.73.100] SSH timeout.
2019-06-06 04:18:11,352 - flintrock.ssh - DEBUG - [54.211.211.220] SSH timeout.
2019-06-06 04:18:16,358 - flintrock.ssh - DEBUG - [54.196.73.100] SSH exception: [Errno None] Unable to connect to port 22 on 54.196.73.100
2019-06-06 04:18:16,639 - flintrock.ssh - INFO - [54.211.211.220] SSH online.
2019-06-06 04:18:16,944 - flintrock.core - INFO - [54.211.211.220] Configuring ephemeral storage...
2019-06-06 04:18:17,384 - flintrock.core - INFO - [54.211.211.220] Installing Java 1.8...
2019-06-06 04:18:21,555 - flintrock.ssh - INFO - [54.196.73.100] SSH online.
2019-06-06 04:18:21,838 - flintrock.core - INFO - [54.196.73.100] Configuring ephemeral storage...
2019-06-06 04:18:22,481 - flintrock.core - INFO - [54.196.73.100] Installing Java 1.8...
2019-06-06 04:18:49,758 - flintrock.ec2 - INFO - launch finished in 0:01:19.
Cluster master: ec2-54-196-73-100.compute-1.amazonaws.com
Login with: flintrock login flint-test
However when I change it to
install-spark: True
Then this is the error ( the ip address of the master and the slave changes)
ec2-user@ip-172-31-39-241 ~]$ flintrock launch flint-test
2019-06-06 04:20:49,442 - flintrock.ec2 - INFO - Launching 2 instances...
2019-06-06 04:21:01,329 - flintrock.ec2 - DEBUG - 2 instances not in state 'running': 'i-0184d608d100651fe', 'i-06d89fac19734310e', ...
2019-06-06 04:21:04,462 - flintrock.ec2 - DEBUG - 2 instances not in state 'running': 'i-0184d608d100651fe', 'i-06d89fac19734310e', ...
2019-06-06 04:21:07,559 - flintrock.ec2 - DEBUG - 2 instances not in state 'running': 'i-0184d608d100651fe', 'i-06d89fac19734310e', ...
2019-06-06 04:21:10,671 - flintrock.ec2 - DEBUG - 2 instances not in state 'running': 'i-0184d608d100651fe', 'i-06d89fac19734310e', ...
2019-06-06 04:21:13,775 - flintrock.ec2 - DEBUG - 2 instances not in state 'running': 'i-0184d608d100651fe', 'i-06d89fac19734310e', ...
2019-06-06 04:21:19,964 - flintrock.ssh - DEBUG - [52.70.94.72] SSH timeout.
2019-06-06 04:21:19,964 - flintrock.ssh - DEBUG - [3.83.155.41] SSH timeout.
2019-06-06 04:21:24,968 - flintrock.ssh - DEBUG - [52.70.94.72] SSH exception: [Errno None] Unable to connect to port 22 on 52.70.94.72
2019-06-06 04:21:24,969 - flintrock.ssh - DEBUG - [3.83.155.41] SSH exception: [Errno None] Unable to connect to port 22 on 3.83.155.41
2019-06-06 04:21:30,200 - flintrock.ssh - INFO - [52.70.94.72] SSH online.
2019-06-06 04:21:30,256 - flintrock.ssh - INFO - [3.83.155.41] SSH online.
2019-06-06 04:21:30,440 - flintrock.core - INFO - [52.70.94.72] Configuring ephemeral storage...
2019-06-06 04:21:30,609 - flintrock.core - INFO - [3.83.155.41] Configuring ephemeral storage...
2019-06-06 04:21:30,872 - flintrock.core - INFO - [52.70.94.72] Installing Java 1.8...
2019-06-06 04:21:31,046 - flintrock.core - INFO - [3.83.155.41] Installing Java 1.8...
2019-06-06 04:21:50,228 - flintrock.services - INFO - [52.70.94.72] Installing Spark...
2019-06-06 04:22:00,064 - flintrock.services - INFO - [3.83.155.41] Installing Spark...
2019-06-06 04:22:07,991 - flintrock.services - INFO - [52.70.94.72] Configuring Spark master...
2019-06-06 04:23:38,149 - flintrock.services - DEBUG - Timed out waiting for Spark master to come up. Trying again...
2019-06-06 04:25:08,243 - flintrock.services - DEBUG - Timed out waiting for Spark master to come up. Trying again...
Do you want to terminate the 2 instances created by this operation? [Y/n]: y
Terminating instances...
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/lib/python3.5/urllib/request.py", line 1254, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "/home/ec2-user/anaconda3/lib/python3.5/http/client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "/home/ec2-user/anaconda3/lib/python3.5/http/client.py", line 1151, in _send_request
self.endheaders(body)
File "/home/ec2-user/anaconda3/lib/python3.5/http/client.py", line 1102, in endheaders
self._send_output(message_body)
File "/home/ec2-user/anaconda3/lib/python3.5/http/client.py", line 934, in _send_output
self.send(msg)
File "/home/ec2-user/anaconda3/lib/python3.5/http/client.py", line 877, in send
self.connect()
File "/home/ec2-user/anaconda3/lib/python3.5/http/client.py", line 849, in connect
(self.host,self.port), self.timeout, self.source_address)
File "/home/ec2-user/anaconda3/lib/python3.5/socket.py", line 711, in create_connection
raise err
File "/home/ec2-user/anaconda3/lib/python3.5/socket.py", line 702, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/lib/python3.5/site-packages/flintrock/services.py", line 415, in health_check
.urlopen(spark_master_ui)
File "/home/ec2-user/anaconda3/lib/python3.5/urllib/request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "/home/ec2-user/anaconda3/lib/python3.5/urllib/request.py", line 466, in open
response = self._open(req, data)
File "/home/ec2-user/anaconda3/lib/python3.5/urllib/request.py", line 484, in _open
'_open', req)
File "/home/ec2-user/anaconda3/lib/python3.5/urllib/request.py", line 444, in _call_chain
result = func(*args)
File "/home/ec2-user/anaconda3/lib/python3.5/urllib/request.py", line 1282, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/home/ec2-user/anaconda3/lib/python3.5/urllib/request.py", line 1256, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 110] Connection timed out>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ec2-user/anaconda3/bin/flintrock", line 11, in <module>
sys.exit(main())
File "/home/ec2-user/anaconda3/lib/python3.5/site-packages/flintrock/flintrock.py", line 1187, in main
cli(obj={})
File "/home/ec2-user/anaconda3/lib/python3.5/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/home/ec2-user/anaconda3/lib/python3.5/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/home/ec2-user/anaconda3/lib/python3.5/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ec2-user/anaconda3/lib/python3.5/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ec2-user/anaconda3/lib/python3.5/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/home/ec2-user/anaconda3/lib/python3.5/site-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/ec2-user/anaconda3/lib/python3.5/site-packages/flintrock/flintrock.py", line 456, in launch
tags=ec2_tags)
File "/home/ec2-user/anaconda3/lib/python3.5/site-packages/flintrock/ec2.py", line 53, in wrapper
res = func(*args, **kwargs)
File "/home/ec2-user/anaconda3/lib/python3.5/site-packages/flintrock/ec2.py", line 955, in launch
identity_file=identity_file)
File "/home/ec2-user/anaconda3/lib/python3.5/site-packages/flintrock/core.py", line 654, in provision_cluster
service.health_check(master_host=cluster.master_host)
File "/home/ec2-user/anaconda3/lib/python3.5/site-packages/flintrock/services.py", line 425, in health_check
raise Exception("Spark health check failed.") from e
Exception: Spark health check failed.
Sorry about the delay here! Will take a look at this Monday.
Thanks for your consideration. I am looking for hearing from you.
Have seen in the config.yaml that the source should be in .tar.gz format
.tar.gz
and .tgz
are interchangeable. Either should work. Just make sure that the URL you configure exists once the version has been substituted in.
Exception: Spark health check failed.
When you see this error, choose not to terminate the instances and instead log in to the cluster master and take a look at the logs under ~/spark/
. They should give you more specific details about why the Spark health check failed.
Hello dear @nchammas.
Sorry, I could not yet. Working on sth else. If you want you can close the issue, then I will open it in case there are difficulties.
OK, sounds good to me.
Any updates now?