aws-samples/spark-on-aws-lambda

Pyspark shell is not working inside of the container

patrick-muller opened this issue · 3 comments

when try to open the pyspark shell it returns the below error

bash-4.2# pyspark 23/03/08 15:54:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Exception in thread "main" org.apache.spark.SparkException: Failed to get main class in JAR with error 'File file:/var/lang/lib/python3.8/site-packages/pyspark/bin/pyspark-shell-main does not exist'. Please specify one with --class. at org.apache.spark.deploy.SparkSubmit.error(SparkSubmit.scala:975) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:486) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:901) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) bash-4.2#

The reason is because the docker compose is replacing the spark class instead of appending the content

the Dockerfile has the below code which replaces the spark-class

COPY spark-class $SPARK_HOME/bin/
RUN chmod -R 755 $SPARK_HOME

without it the Lambda function returns the error
/var/lang/lib/python3.8/site-packages/pyspark/bin/spark-class: line 92: /dev/fd/62: No such file or directory

so, to use pyspark we need to comment the COPY spark-class.

Finished the bug fix