All commands assume that spark is correctly installed and available on your $PATH
All test files are located in ./test_data
directory. To run the spark pipeline use the following commands:
# build project and produce fat jar file
sbt clean assembly
# submit spark joib using this command
spark-submit \
--class "com.spark.home.assignment.S3App" \
--master "local[*]" \
target/scala-2.13/s3-app.jar \
--input ./test_data \
--output ./target/result.tsv \
--local
# build project and produce fat jar file
sbt clean assembly
# submit spark joib using this command
spark-submit \
--class "com.spark.home.assignment.S3App" \
--master "local[*]" \
target/scala-2.13/s3-app.jar \
--input /your/input/data/directory \
--output /your/result/file/path.tsv \
--local
First of all you need to private proper credentials in your credentials file located in ~/.aws/credentials
. By default the pipeline will use default
profile.
If you want to use custom file use option --credentials
and provide full path of your file of choice.
Full command below:
# submit spark joib using this command
spark-submit \
--class "com.spark.home.assignment.S3App" \
--master "local[*]" \
target/scala-2.13/s3-app.jar \
--input s3n://data-processing-spark/input \
--output s3n://data-processing-spark/output/result.tsv \
--credentials ~/.aws/credentials
Run
sbt test
to execute unit tests.