/docker-hadoop-pseudo-distributed-mode

Run Hadoop 3.1.2 on Ubuntu 16.04 inside docker container in Pseudo-distributed mode

Primary LanguageShellApache License 2.0Apache-2.0

Run Hadoop 3.1.2 (with Hive 2.3.4) on Ubuntu 16.04 inside docker container in Pseudo-distributed mode

How to Run

  • Go to your terminal.

  • Navigate to directory with Dockerfile and build image

     docker build -t <image_name> .
    
  • or

     docker pull macio232/hadoop-pseudo-distributed-mode
    
  • Run the following command

     docker run -p 9870:9870 -p 8088:8088 -v <host-directory>:/home/hadoop/data -it --name=container_name <image_name>
    

    Runs Hadoop startup script and bash on ENTRYPOINT.

  • To use hive run

     nohup hiveserver2 &
    

    and press ENTER. This starts hive server.

    To connect to the client run

     beeline -n root -u jdbc:hive2://localhost:10000
    

TODO

  • Add execution of stop-dfs.sh and stop-yarn.sh at shutdown as described in here
  • Solve mesg: ttyname failed: Inappropriate ioctl for device issue during benchmark execution

Configuration References

The original repository for this image can be found under https://github.com/mjaglan/docker-hadoop-pseudo-distributed-mode.