Steps to Run the application
-
Direct Runner
go run app.go --output wordcounts.txt --runner direct
-
Run using Spark Runner
- Start Spark (If using binaries to start local spark - Use 2.4.7 not 3.x, then skip this step)
- Use below command to start locally in distributed mode using docker
- Start Master
docker run --name spark-master -h spark-master -e ENABLE_INIT_DAEMON=false -d bde2020/spark-master:3.0.0-hadoop3.2
- Start Worker
docker run --name spark-worker-1 --link spark-master:spark-master -e ENABLE_INIT_DAEMON=false -d bde2020/spark-worker:3.0.0-hadoop3.2
- Start Master
- Check connection using
spark-shell --master spark://localhost:7077
- Run this code in terminal (It should return PI)
val NUM_SAMPLES=10000 var count = sc.parallelize(1 to NUM_SAMPLES).filter { _ => val x = math.random val y = math.random x*x + y*y < 1 }.count() * 4/(NUM_SAMPLES.toFloat)
- Check application status at http://localhost:4040
- Check Master status at http://localhost:8080
- Check Worker status at http://localhost:8081
- Use below command to start locally in distributed mode using docker
- Start Beam JobServer
docker run --net=host apache/beam_spark_job_server:latest --spark-master-url=spark://192.168.1.11:7077
- If running manually
java -jar ~/Downloads/beam-runners-spark-job-server-2.25.0.jar --spark-master-url=spark://192.168.1.11:7077 --job-port=8099 --artifact-port 0
- If running manually
- Run Application using SparkRunner
go run app.go -output=wordcounts.txt -runner=spark -endpoint=localhost:8099 --environment_type=LOOPBACK
- Start Spark (If using binaries to start local spark - Use 2.4.7 not 3.x, then skip this step)