LLNL/magpie

JAVA_HOME Not Respected

nealepetrillo opened this issue · 2 comments

Hello,

I'm trying to use Magpie / Spark under Slurm but receive the following error when the system tries to attempt to start the worker daemon:

node2: failed to launch: nice -n 0 /lustre/spark/2.3.0/bin/spark-class org.apache.spark.deploy.worker.Worker --webui-port 8081 spark://node1:7077
node2:    JAVA_HOME is not set

Adding srun --no-kill -w 0 'env' shows JAVA_HOME correctly set to the desired variable.

I'm therefore thinking that there's some failure to propagate JAVA_HOME in the Spark / Hadoop setup scripts but I didn't see anything obvious.

Any suggestions?

chu11 commented

The JAVA_HOME environment variable is actually passed to spark via the spark-env.sh file, which Magpie should be creating and putting in SPARK_CONF_DIR appropriately. Spark should always check/source spark-env.sh before it is run (i.e. via spark-submit or spark-classs, etc.).

Hmmm. Did you perhaps not apply the patch for spark in patches/spark? There is a modification in there to make sure all the workers know the right path to SPARK_CONF_DIR.

Redeploying Spark then reapplying the patches with

patch -p1 < /lustre/magpie/2.0/patches/spark/2.3.00-bin-hadoop2.7-alternate.patch

seemed to fix the problem.

Thanks!