/rtjvm_spark_optimizations

Rock The JVM - Spark Optimizations with Scala

Primary LanguageScalaMIT LicenseMIT

Rock The JVM - Spark Optimizations with Scala

Master Spark optimization techniques with Scala.

Certificate

Certificate of Completion

Sections

  1. Scala and Spark Recap
  2. Spark Performance Foundations
  3. Optimizing DataFrame Transformations
  4. Optimizing RDD Transformations
  5. Optimizing Key-Value RDDs

Setup

IntelliJ IDEA

Install IntelliJ IDEA with the Scala plugin.

Docker

Install Docker:

Build images:

$ cd spark-cluster
$ chmod +x build-images.sh
$ ./build-images.sh

Start dockerized Spark cluster:

$ docker compose up --scale spark-worker=3

Access each container:

# List active containers
$ docker ps
# Get a shell in any container
$ docker exec -it CONTAINER_NAME bash