Spark running with Yarn cluster mode
Closed this issue · 6 comments
Hi,
I am seeing failure while running the spark job in yarn cluster mode but it is working in client mode. Can you please help me.
Caused by: com.yotpo.metorikku.exceptions.MetorikkuException: No arguments passed to metorikku
Regards,
VJ
Can you send the command you're using in YARN mode?
Please find the command below
spark-submit --master yarn --deploy-mode cluster --conf spark.sql.catalogImplementation=hive --class com.yotpo.metorikku.Metorikku metorikku.jar -c config.yaml
Is this still happening? sorry for the very late reply
Yes still the same error
We actually never ran metorikku in yarn cluster mode. We will look into it next week. Probably something with how cluster mode passes args to scopt
Hi,
I succeeded to reproduce the issue you are describing.
the cause of it is that spark looks for the files (metrics, input) and can't find them in the cluster mode because they have to exist on HDFS or the all the nodes file system
in the error log, you can see which path was used, and that "Supplied file not found".
possible solution:
use an explicit path with fs explicitly, like file:// or hdfs:// and make sure to upload the files to the nodes/hdfs.
btw, we have an issue to support external files
#142