/bigdata-docker

BigData Cluster powered by docker: hadoop cluster, spark cluster, ...

Primary LanguageShellMIT LicenseMIT

BigData-docker

使用docker创建hadoop集群或Spark集群

  • hadoop version: hadoop-3.2.1
  • spark version: spark-3.0.0-preview2-bin-hadoop3.2
  • jdk version: AdoptOpenJDK8U-jdk_x64_linux_hotspot_8u242b08

如何使用

配置集群(可选)

你可以编辑hadoop-basespark-base文件夹下的配置文件来配置集群,这些文件会覆盖docker镜像中对应的配置文件,修改之后需要重新创建集群才能生效。

创建集群

只创建hadoop集群:

docker-compose -f "haddop-cluster\docker-compose.yml" up -d --build

创建hadoop和spark集群:

docker-compose -f "spark-cluster\docker-compose.yml" up -d --build

容器结构

hadoop-cluster:

Creating hadoop-master ... done
Creating hadoop-slave1 ... done
Creating hadoop-slave2 ... done
Creating hadoop-slave3 ... done

spark-cluster:

Creating hadoop-slave1 ... done
Creating hadoop-slave2 ... done
Creating hadoop-slave3 ... done
Creating spark-slave2  ... done
Creating spark-slave1  ... done

Creating hadoop-master ... done
Creating spark-master  ... done

默认端口

文件数据

  • spark、hadoop和jdk均在/opt目录下
  • hdfs的数据文件在/opt/data目录下

其他说明

  • jdk、spark、hadoop均为编译时通过网络下载,可在hadoop-base/Dockerfilespark-base/Dockerfile中更改下载链接(默认均为清华镜像的链接)。
  • apt软件源更改为了清华源,如果不需要更改请删除hadoop-base/Dockerfilespark-base/Dockerfile中的COPY sources.list /etc/apt/语句。
  • 支持xcall命令,在hadoop-master中运行xcall commandcommand将在hadoop-master和hadoop-slave上运行。例如xcall jps