PySpark Boilerplate

A template project for writing PySpark jobs.

Prerequisites

  • python3.7
  • make
  • zip

Usage

  1. To show available commands, run make or make help.
  2. Prepare dev environment
    make prepare-dev
    
  3. Build
    make build
    
  4. Submit
    $SPARK_HOME/bin/spark-submit \
       --name "word-count" \
       --master "local[2]" \
       --py-files dist/packages.zip,dist/libs.zip \
       dist/main.py \
       --in-path file://$(pwd)/tests/unit/some_text_file.txt \
       --out-path file://$(pwd)/build/out