Configuring HDFS Master timeout

Question

Configuring HDFS Master timeout

13k75 opened this issue 5 years ago · 2 comments

Flintrock version: 0.11.0
Python version: 3.7.4
OS: Linux

Hi Nicholas,

When starting my cluster, the HDFS configuration times out. Unlike the previous issue about m5.large's though, the Hadoop logs don't show anything amiss: the NameNode and SecondaryNameNode are starting and stopping normally.

Here is my config file:

services:
  spark:
    version: 2.4.4

  hdfs:
    version: 3.1.2

provider: ec2

providers:
  ec2:
    key-name: spark_cluster
    identity-file: /home/kasra/distributed-setup/spark_cluster.pem
    instance-type: t2.micro
    region: us-west-2
    ami: ami-04b762b4289fba92b # amazon linux 2
    user: ec2-user
    tenancy: default  # default | dedicated
    ebs-optimized: no  # yes | no
    instance-initiated-shutdown-behavior: terminate  # terminate | stop

launch:
  num-slaves: 1
  install-hdfs: True
  install-spark: True

debug: true

And I'm happy to provide the Hadoop logs too if you want them, though like I said they don't show any errors or warnings.

I would appreciate any help or insight you might have. Thanks!

Answer 1 · 2019-09-21T19:29:47.000Z

I don't know that Spark will work out of the box with Hadoop 3+. I would stick to Flintrock's default of Hadoop 2.8.5 and see if you still have any issues.

Answer 2 · 2019-09-22T22:50:43.000Z

Yes, that's exactly it! Hadoop 3+ switches a bunch of ports. In particular, instead of 50070 it's 9870.