Master Spark optimization techniques with Scala.
- https://rockthejvm.com/p/spark-performance-tuning
- https://github.com/rockthejvm/spark-performance-tuning
- https://github.com/rockthejvm/spark-performance-tuning/releases/tag/start
I'm the operator with my cluster calculator
Install IntelliJ IDEA with the Scala plugin.
Install Docker:
- https://docs.docker.com/desktop/install/ubuntu/
- https://docs.docker.com/engine/install/ubuntu/#set-up-the-repository
Build images:
$ cd spark-cluster
$ chmod +x build-images.sh
$ ./build-images.sh
Start dockerized Spark cluster:
$ docker compose up --scale spark-worker=3
Access containers:
# List active containers
$ docker ps
# Get a shell in any container
$ docker exec -it CONTAINER_NAME bash