Master Spark optimization techniques with Scala.
- https://rockthejvm.com/p/spark-optimization
- https://github.com/rockthejvm/spark-optimization
- https://github.com/rockthejvm/spark-optimization/releases/tag/start
- Scala and Spark Recap
- Spark Performance Foundations
- Optimizing DataFrame Transformations
- Optimizing RDD Transformations
- Optimizing Key-Value RDDs
Install IntelliJ IDEA with the Scala plugin.
Install Docker:
- https://docs.docker.com/desktop/install/ubuntu/
- https://docs.docker.com/engine/install/ubuntu/#set-up-the-repository
Build images:
$ cd spark-cluster
$ chmod +x build-images.sh
$ ./build-images.sh
Start dockerized Spark cluster:
$ docker compose up --scale spark-worker=3
Access each container:
# List active containers
$ docker ps
# Get a shell in any container
$ docker exec -it CONTAINER_NAME bash