Sparkler cannot be executed on Databricks because sparkContext not pulled from sparkSession
mattvryan-github opened this issue · 0 comments
Issue Description
When trying to run Sparkler on a databricks cluster it fails to see the worker nodes. This is because the way Databricks image sets up the spark environment the sparkContext must be pulled from the sparkSession.
How to reproduce it
Put sparkler fat jar, conf and plugin directories on the master node of a databricks cluster and try to crawl. You will get messages like:
2020-10-05 22:50:43 INFO Injector$:97 - Injecting 1 seeds
2020-10-05 22:50:47 WARN SparkContext:69 - Please ensure that the number of slots available on your executors is limited by the number of cores to task cpus and not another custom resource. If cores is not the limiting resource then dynamic allocation will not work properly!
2020-10-05 22:51:04 WARN TaskSchedulerImpl:69 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Environment and Version Information
Please indicate relevant versions, including, if relevant:
- Java Version
- Spark Version 3.0.1
- Operating System name and version Redhat and Ubuntu linux
An external links for reference
https://docs.databricks.com/jobs.html
Contributing
If you'd like to help us fix the issue by contributing some code, but would like guidance or help in doing so, please mention it!
pull request in process