/hadoop-docker

Apache Hadoop's Pseudo Distributed Mode using Docker. 🐳

Primary LanguageDockerfileApache License 2.0Apache-2.0

Apache Hadoop using Docker 🐳

A Docker image to play around with Apache Hadoop in Pseudo Distributed Mode (single cluster mode).

Below are the steps to play around with this image using Play with Docker.

  1. First of all, create an account on Docker Hub.
  2. Login to Play with Docker using the Docker Hub account you just created.
  3. You should see a green "Start" button, click on it to start a session.
  4. Create an instance by clicking on "+ Add new instance" in the left pane, to create a VM.
  5. A new terminal should show up in the right pane. Here, we need to pull the Docker image from Github Container Registry (GHCR). To do so, execute:
docker pull ghcr.io/kasipavankumar/hadoop-docker:latest
  1. After the image has been pulled into the VM, we need to start a new container & switch into it's terminal (mostly bash). To do so, execute:
docker run -it ghcr.io/kasipavankumar/hadoop-docker:latest

At this stage, the image will be booting up by executing all the required steps to start Hadoop.

From now on, you will be inside container's bash (terminal) and can start using Hadoop's filesystem commands. 🚀


A note size of the image

The final Docker image weighs around 1.8GB, wherein Hadoop & Java take up the majority piece. When analyzed using Dive, the efficiency came out to be around 99% (sweet).

Docker image analysis


Deploy Docker image


D. Kasi Pavan Kumar (c) 2021