/rtjvm_spark_tuning

Rock The JVM - Spark Performance Tuning with Scala

Primary LanguageScalaMIT LicenseMIT

Rock The JVM - Spark Optimizations with Scala

Master Spark optimization techniques with Scala.

Certificate

Certificate of Completion

I'm the operator with my cluster calculator

Sections

  1. Scala and Spark Recap
  2. Foundations
  3. Memory, Caching and Checkpointing
  4. Partitioning
  5. Performance Tuning

Setup

IntelliJ IDEA

Install IntelliJ IDEA with the Scala plugin.

Docker

Install Docker:

Build images:

$ cd spark-cluster
$ chmod +x build-images.sh
$ ./build-images.sh

Start dockerized Spark cluster:

$ docker compose up --scale spark-worker=3

Access containers:

# List active containers
$ docker ps
# Get a shell in any container
$ docker exec -it CONTAINER_NAME bash