RedisLabs/spark-redis

Pyspark error loading data into Redis - java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;

jeames00 opened this issue · 0 comments

I'm trying to follow the spark-redis tutorial for Python but keep encountering an error when writing the dataframe:

>>> data.write.format("org.apache.spark.sql.redis").option("table", "people").option("key.column", "en_curid").save()

Traceback (most recent call last):
File "<stdin>", line 1, in
File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/sql/readwriter.py", line 828, in save
self._jwrite.save()
File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in call
File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/sql/utils.py", line 128, in deco
return f(*a, **kw)
File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o50.save.
: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;

Pyspark is launched with the command:
pyspark --conf "spark.redis.host=localhost" --conf "spark.redis.port=6379" --jars target/spark-redis_2.11-2.6.0-SNAPSHOT-jar-with-dependencies.jar

$ spark-submit --version
21/02/21 23:29:20 WARN Utils: Your hostname, pop-os resolves to a loopback address: 127.0.1.1; using 192.168.1.119 instead (on interface ens18)
21/02/21 23:29:20 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.2
      /_/
                        
Using Scala version 2.12.10, Java HotSpot(TM) Server VM, 1.8.0_281

I have a redis server running in docker on localhost:6379, no problems connecting to it from redis-cli.

Pop!_OS 20 20.10
Python version 3.8.6
Pyspark version 3.0.2
Pipenv version 11.9.0
Java version "1.8.0_281"
Java(TM) SE Runtime Environment (build 1.8.0_281-b09)
Java HotSpot(TM) Server VM (build 25.281-b09, mixed mode)

Full error below:

>>> full_df = spark.read.csv("pantheon.tsv", sep="\t", quote="", header=True, inferSchema=True)
21/02/21 22:21:14 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
>>> data = full_df.select("en_curid", "countryCode", "occupation")
>>> data.write.format("org.apache.spark.sql.redis").option("table", "people").option("key.column", "en_curid").save()
Traceback (most recent call last):
File "<stdin>", line 1, in
File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/sql/readwriter.py", line 828, in save
self._jwrite.save()
File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1304, in call
File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/sql/utils.py", line 128, in deco
return f(*a, **kw)
File "/home/username/.local/share/virtualenvs/spark-redis-JRXWdY5Z/lib/python3.8/site-packages/pyspark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o50.save.
: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
at com.redislabs.provider.redis.RedisConfig.clusterEnabled(RedisConfig.scala:198)
at com.redislabs.provider.redis.RedisConfig.getNodes(RedisConfig.scala:320)
at com.redislabs.provider.redis.RedisConfig.getHosts(RedisConfig.scala:236)
at com.redislabs.provider.redis.RedisConfig.(RedisConfig.scala:135)
at org.apache.spark.sql.redis.RedisSourceRelation.(RedisSourceRelation.scala:34)
at org.apache.spark.sql.redis.DefaultSource.createRelation(DefaultSource.scala:21)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:126)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:962)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:962)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:414)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:398)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)