Installs and configures spark (master, thrift, historyserver and slave)
This role is essentially plugin for ansible-role-hadoop and should be run after it
For using in host/group vars, allows you to select which parts of spark to install
spark_master: true
spark_thriftserver: true
spark_historyserver: true
spark_slave: true
Also don't forget to set spark_master_host to ip pointing to node with spark_master
- 8081 - master web
- 18080 - historyserver
You will need to manually create required HDFS folders for spark:
hadoop@hdp-1:~$ hdfs dfs -mkdir -p /spark/logs
hadoop@hdp-1:~$ hdfs dfs -mkdir -p /spark/apphistory
For spark on yarn: ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client examples/jars/spark-examples_2.11-2.2.0.jar 1000 For standalone: ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://YOUR_SPARK_MASTER:7077 --deploy-mode client examples/jars/spark-examples_2.11-2.2.0.jar 1000 On lowspec mode it's worth to use --executor-memory 512M --driver-memory 512M
- Fix native libs
- Fix logging to journald
- Fix PID files location (maybe push them somewhere to /opt/hadoop/pids?)
- Spark ignores normal exit signals and needs to be killed by systemd. Not sure what i can do about this.