YotpoLtd/metorikku

Spark running with Yarn cluster mode

Closed this issue · 6 comments

Hi,
I am seeing failure while running the spark job in yarn cluster mode but it is working in client mode. Can you please help me.
Caused by: com.yotpo.metorikku.exceptions.MetorikkuException: No arguments passed to metorikku

Regards,
VJ

Can you send the command you're using in YARN mode?

Please find the command below

spark-submit --master yarn --deploy-mode  cluster --conf spark.sql.catalogImplementation=hive  --class com.yotpo.metorikku.Metorikku metorikku.jar -c config.yaml

Is this still happening? sorry for the very late reply

Yes still the same error

We actually never ran metorikku in yarn cluster mode. We will look into it next week. Probably something with how cluster mode passes args to scopt

Hi,
I succeeded to reproduce the issue you are describing.
the cause of it is that spark looks for the files (metrics, input) and can't find them in the cluster mode because they have to exist on HDFS or the all the nodes file system

in the error log, you can see which path was used, and that "Supplied file not found".
possible solution:
use an explicit path with fs explicitly, like file:// or hdfs:// and make sure to upload the files to the nodes/hdfs.

btw, we have an issue to support external files
#142