BigData Solution written in Python based on Hadoop and Spark.

Scale your data management by distributing workload and storage on Hadoop and Spark Clusters, explore and transform your data in Jupyter Notebook.

LinkedIn

About The Project

Purpose for this tutorial is to show how to get started with Hadoop, Spark and Jupyter for your BigData solution, deploy as Docker Containers.

Architecture overview

Pre-requisite

  • Only confirmed working on Linux/Windows (Apple Silicon might have issues).
  • Ensure Docker is installed.

Start

Execute bash master-build.sh to start the the build and start the containers.

Hadoop

Access Hadoop UI on ' http://localhost:9870 '

Spark

Access Spark Master UI on ' http://localhost:8080 '

Jupyter

Access Jupyter UI on ' http://localhost:8888 '

Contributing

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/featureName)
  3. Commit your Changes (git commit -m 'Add some featureName')
  4. Push to the Branch (git push origin feature/featureName)
  5. Open a Pull Request

Contact

Martin Karlsson

LinkedIn : martin-karlsson
Twitter : @HelloKarlsson
Email : hello@martinkarlsson.io
Webpage : www.martinkarlsson.io

Project Link: github.com/martinkarlssonio/big-data-solution