Can't run spark-shell or pyspark
fernandojpsilva opened this issue · 0 comments
Hi! New to using this, but have been struggling with running spark-shell/pyspark inside the container. Initially I was attempting to run a simple python script that creates a local spark session and does some dataframe transformations, but it was crashing. So I tried just running spark-shell
and still got the same error. Here's the full log:
sh-5.1# spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[WARN ] 2024-06-04 11:22:52.489 [main] NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[WARN ] 2024-06-04 11:22:53.230 [main] SparkContext: Exception when load sparklyr connector java.lang.ClassNotFoundException: org.apache.spark.sparklyr.DefaultConnector
Spark context Web UI available at http://156570b35212:4040
Spark context available as 'sc' (master = local[*], app id = local-1717500173045).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.4.1
/_/
Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 1.8.0_372)
Type in expressions to have them evaluated.
Type :help for more information.
scala> [ERROR] 2024-06-04 11:23:03.262 [lighter-poll-status] LighterClientState: fetch status
java.io.IOException: Could not find Lighter configuration file: conf/lighter-config.json
at org.apache.spark.lighter.client.JsonConfigReader.reload(JsonConfigReader.scala:20) ~[spark-lighter-core_2.12-2.0.8_spark-3.4.0.jar:?]
at org.apache.spark.lighter.client.JsonConfigReader.<init>(JsonConfigReader.scala:15) ~[spark-lighter-core_2.12-2.0.8_spark-3.4.0.jar:?]
at org.apache.spark.lighter.client.LighterClientContext.init(LighterClientContext.scala:56) ~[spark-lighter-core_2.12-2.0.8_spark-3.4.0.jar:?]
at org.apache.spark.lighter.client.LighterClientContext.<init>(LighterClientContext.scala:45) ~[spark-lighter-core_2.12-2.0.8_spark-3.4.0.jar:?]
at org.apache.spark.lighter.client.LighterClientContext$.context$lzycompute(LighterClientContext.scala:123) ~[spark-lighter-core_2.12-2.0.8_spark-3.4.0.jar:?]
at org.apache.spark.lighter.client.LighterClientContext$.context(LighterClientContext.scala:123) ~[spark-lighter-core_2.12-2.0.8_spark-3.4.0.jar:?]
at org.apache.spark.lighter.client.LighterClientContext$.getOrCreate(LighterClientContext.scala:125) ~[spark-lighter-core_2.12-2.0.8_spark-3.4.0.jar:?]
at org.apache.spark.lighter.client.LighterClientState$.fetchStatus(LighterClientState.scala:99) ~[spark-lighter-core_2.12-2.0.8_spark-3.4.0.jar:?]
at org.apache.spark.lighter.client.LighterClientState$.$anonfun$new$1(LighterClientState.scala:80) ~[spark-lighter-core_2.12-2.0.8_spark-3.4.0.jar:?]
at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1454) ~[spark-core_2.12-3.4.1.jar:3.4.1]
at org.apache.spark.lighter.client.LighterClientState$$anon$1.run(LighterClientState.scala:90) ~[spark-lighter-core_2.12-2.0.8_spark-3.4.0.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_372]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[?:1.8.0_372]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_372]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[?:1.8.0_372]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_372]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_372]
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_372]
The same error happens when trying to run pyspark
. To add more context, the only information I found about this json file was in the Create and manage Apache Spark job definitions in Visual Studio Code page posted by Microsoft about MS Fabric. They mention that:
In the root folder of the source script, the system creates a subfolder named conf. Within this folder, a file named lighter-config.json contains some system metadata needed for the remote run. Do NOT make any changes to it.
However I can't find such file. I'm running it on WSL. The only changes to the Dockerfile I've made was adding
RUN tdnf install -y wget tar awk procps
Please let me know if you are aware of this and if there's a workaround. Thank you.