Spark connection issue on EC2

Question

Spark connection issue on EC2

dorienh opened this issue 2 years ago · 5 comments

I've installed hadoop/spark with flintrock on EC2. MapReduce works fine, but when I try spark (both with spark-submit), as well as with Zeppelin, it gives me a connection error.

Below is the output from Zeppelin.

Did I mess up with the IP addresses, or do I need to open a TCP port or so?

Py4JJavaError: An error occurred while calling o120.partitions.
: java.net.ConnectException: Call From ip-172-31-19-18.ec2.internal/172.31.19.18 to ip-172-31-19-18.ec2.internal:9000 failed on connection exception: java.net.ConnectException: Connection refused;

flintrock, version 2.0.0
spark:
version: 3.1.2
download-source: "https://archive.apache.org/dist/spark/spark-3.1.2/"
hdfs:
version: 3.2.0
download-source: "https://archive.apache.org/dist/hadoop/common/hadoop-3.2.0/"
OS: ami-0b5eea76982371e91 # Amazon Linux 2 5.10

Answer 1 · 2023-02-27T13:53:06.000Z

Let's first make sure your cluster is in working order.

Does spark-shell or pyspark work if you SSH directly into the master?

Answer 2 · 2023-02-27T14:10:12.000Z

Magically it works after logging out and in. Don't know if it helped, but I did run

$HADOOP_PREFIX/sbin/yarn-daemon.sh start resourcemanager

Answer 3 · 2023-02-27T14:30:17.000Z

Running a Spark shell from the master shouldn't require that.

In any case, are you all set then?

Answer 4 · 2023-02-27T14:57:48.000Z

Yes, all good. Thanks for the great project!

On Mon, 27 Feb 2023 at 22:30, Nicholas Chammas ***@***.***> wrote: Running a Spark shell from the master shouldn't require that. In any case, are you all set then? — Reply to this email directly, view it on GitHub <#356 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABEES6ZDMYELD5E7BYQRAYTWZS3AJANCNFSM6AAAAAAVJIUZQY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- Sent from my mobile device.

Answer 5 · 2023-02-27T15:37:31.000Z

Glad you found it useful.