This repo contains a few examples that show how to develop and run Apache Spark applications in a Docker environment.
The code is organized into a number of Maven submodules; please consult the respective README.md
files to learn more.
- Word count in Apache Spark
- Example with Spark SQL and Hive
Firstly, let's build the maven modules & docker images:
mvn clean package
Then verify that images have been created:
docker images
- All examples will run without modification using my Data Science playground for Docker.
- Java 8 lambda expressions are used throughout the code.
- Docker images are build using Spotify's Docker Maven plugin.