Spark connection issue on EC2
dorienh opened this issue · 5 comments
I've installed hadoop/spark with flintrock on EC2. MapReduce works fine, but when I try spark (both with spark-submit), as well as with Zeppelin, it gives me a connection error.
Below is the output from Zeppelin.
Did I mess up with the IP addresses, or do I need to open a TCP port or so?
Py4JJavaError: An error occurred while calling o120.partitions.
: java.net.ConnectException: Call From ip-172-31-19-18.ec2.internal/172.31.19.18 to ip-172-31-19-18.ec2.internal:9000 failed on connection exception: java.net.ConnectException: Connection refused;
flintrock, version 2.0.0
spark:
version: 3.1.2
download-source: "https://archive.apache.org/dist/spark/spark-3.1.2/"
hdfs:
version: 3.2.0
download-source: "https://archive.apache.org/dist/hadoop/common/hadoop-3.2.0/"
OS: ami-0b5eea76982371e91 # Amazon Linux 2 5.10
Let's first make sure your cluster is in working order.
Does spark-shell
or pyspark
work if you SSH directly into the master?
Magically it works after logging out and in. Don't know if it helped, but I did run
$HADOOP_PREFIX/sbin/yarn-daemon.sh start resourcemanager
Running a Spark shell from the master shouldn't require that.
In any case, are you all set then?
Glad you found it useful.