Project build using: https://github.com/big-data-europe/docker-spark
Supported versions:
- Spark 3.0.0 for Hadoop 3.2 with OpenJDK 8 and Scala 2.12
- Spark 2.4.5 for Hadoop 2.7+ with OpenJDK 8
docker-compose up
Master: http://localhost:8080
Workers:
docker exec -it spark-worker-1 bash
Run pyspark CLI:
# Run pyspark CLI
./spark/bin/pyspark
# Execute a file
cd home/python/example
./../../../spark/bin/spark-submit example.py data.csv
Spark monitor:
apk add gcc pip3 install notebook