nchammas/flintrock

Spark connection issue on EC2

dorienh opened this issue · 5 comments

I've installed hadoop/spark with flintrock on EC2. MapReduce works fine, but when I try spark (both with spark-submit), as well as with Zeppelin, it gives me a connection error.

Below is the output from Zeppelin.

Did I mess up with the IP addresses, or do I need to open a TCP port or so?

Py4JJavaError: An error occurred while calling o120.partitions.
: java.net.ConnectException: Call From ip-172-31-19-18.ec2.internal/172.31.19.18 to ip-172-31-19-18.ec2.internal:9000 failed on connection exception: java.net.ConnectException: Connection refused;

flintrock, version 2.0.0
spark:
version: 3.1.2
download-source: "https://archive.apache.org/dist/spark/spark-3.1.2/"
hdfs:
version: 3.2.0
download-source: "https://archive.apache.org/dist/hadoop/common/hadoop-3.2.0/"
OS: ami-0b5eea76982371e91 # Amazon Linux 2 5.10

Let's first make sure your cluster is in working order.

Does spark-shell or pyspark work if you SSH directly into the master?

Magically it works after logging out and in. Don't know if it helped, but I did run

$HADOOP_PREFIX/sbin/yarn-daemon.sh start resourcemanager

Running a Spark shell from the master shouldn't require that.

In any case, are you all set then?

Glad you found it useful.