PySpark Boilerplate
A template project for writing PySpark jobs.
Prerequisites
- python3.7
- make
- zip
Usage
- To show available commands, run
make
ormake help
. - Prepare dev environment
make prepare-dev
- Build
make build
- Submit
$SPARK_HOME/bin/spark-submit \ --name "word-count" \ --master "local[2]" \ --py-files dist/packages.zip,dist/libs.zip \ dist/main.py \ --in-path file://$(pwd)/tests/unit/some_text_file.txt \ --out-path file://$(pwd)/build/out