- Airflow 2.7.0
- Hadoop 2.8.0
- Hive 2.3.2 (scala 2.11.8, python 2.7.9)
- Spark 2.1.2
- Kafka 7.0.0-css
-
Start an cluster
run_cluster.shrun_cluster.sh
-
Stop an cluster
stop_cluster.shstop_cluster.sh
- Airflow
- Webserver: http://localhost:8080
- Worker: ssh://localhost:10022
- Hadoop
- Namenode: http://localhost:50070
- Datanode: http://localhost:50075
- Hue: http://localhost:8088
- Hive
- Server port: 10000
- Metastore(postgreSQL) port: 9083
- Spark
- Master port: 7077
- Master webui: http://localhost:18080
- Worker port: port forwarding X
- Worker webui: http://localhost:8081
- Worker notebook: http://localhost:9001
- Kafka
- Kafka port: 9092
- Kafdrop: http://localhost:9000
LISTENER_DOCKER_INTERNAL
port: 19092LISTENER_DOCKER_EXTERNAL
port: 9092
- Hive
- hive_cli_conn
Connection Id
: hive_cli_connConnection Type
: Hive Client WrapperHost
: localhost (host IP)Port
: 10000Extra
: {"use_beeline": true}
- hive_cli_conn
- Spark
- spark_conn
Connection Id
: spark_connConnection Type
: SparkHost
: local[*]
- spark_conn
- Kafka
- kafka_default
Connection Id
: kafka_defaultConnection Type
: Apache KafkaConfig Dict
{ "bootstrap.servers": "kafka-server:19092", "group.id": "group_1", "security.protocol": "PLAINTEXT", "auto.offset.reset": "beginning" }
- kafka_listener
Connection Id
: kafka_listenerConnection Type
: Apache KafkaConfig Dict
{ "bootstrap.servers": "kafka-server:19092", "group.id": "group_2", "security.protocol": "PLAINTEXT", "auto.offset.reset": "beginning" }
- kafka_default
- Maintainer of base repository(big-data-europe/docker-hadoop-spark-workbench): Ivan Ermilov @earthquakesan
- https://github.com/ayyoubmaul/hadoop-docker
- https://medium.com/@ayyoubmaulana/developing-multi-nodes-hadoop-spark-cluster-and-airflow-in-docker-compose-part-1-10331e1e71b3
- https://github.com/mjstealey/hadoop
- https://jybaek.tistory.com/922
- https://1mini2.tistory.com/102
- https://airflow.apache.org/docs/apache-airflow/2.7.0/docker-compose.yaml