samelamin/spark-bigquery

Struck with error py4j.protocol.Py4JJavaError: An error occurred while calling o39.saveAsBigQueryTable. : java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)

Closed this issue · 6 comments

Using Amazon EMR with Hadoop2, Java 1.8.
i would like stream data from Amazon Emr to Bigquery
Struck with getting error
File "/home/hadoop/pyjobs/py_script/s3_bigquery_0_1.py", line 58, in
bqDF.saveAsBigQueryTable("{0}:{1}.{2}".format(BQ_PROJECT_ID, BQ_DATA_SET, TABLE_NAME),False,0,bigquery.getattr("package$WriteDisposition$").getattr("MODULE$").WRITE_EMPTY(),bigquery.getattr("package$CreateDisposition$").getattr("MODULE$").CREATE_IF_NEEDED())
File "/usr/local/lib/python2.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call
File "/usr/local/lib/python2.7/site-packages/pyspark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/usr/local/lib/python2.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o39.saveAsBigQueryTable.
: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V
at com.google.cloud.hadoop.io.bigquery.BigQueryStrings.parseTableReference(BigQueryStrings.java:68)
at com.samelamin.spark.bigquery.BigQueryDataFrame.saveAsBigQueryTable(BigQueryDataFrame.scala:40)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)

Command Line
spark-submit --packages com.github.samelamin:spark-bigquery_2.11:0.2.6,org.apache.hadoop:hadoop-aws:2.7.3,com.databricks:spark-csv_2.11:1.3.0 --jars /home/hadoop/pyjobs/jars/minimal-json-0.9.4.jar,/home/hadoop/pyjobs/jars/spark-bigquery-0.2.5.jar,/home/hadoop/pyjobs/jars/spark-bigquery-0.1.0-s_2.11.jar,/home/hadoop/pyjobs/jars/gcs-connector-hadoop2-latest.jar,/home/hadoop/pyjobs/jars/google-api-client-1.4.1-beta.jar,/home/hadoop/pyjobs/jars/guava-21.0.jar,,/home/hadoop/pyjobs/jars/google-api-services-bigquery-v2-rev92-1.14.2-beta.jar /home/hadoop/pyjobs/py_script/s3_bigquery_0_1.py

Unable to import Module..
import com.samelamin.spark.bigquery._
Import ERROR
in python files..

This sounds like a bug that was fixed with the latest release, can you confirm what version you are using?

I picked 2.6.0 and built an uber jar. Still doesn't work. Read and Write, both throwing same error
My Jar size: 15710607 Apr 29 15:49 sparkbigquery-0.0.1-SNAPSHOT.jar (this is a wrapper jar. I built it so that I can have a big uber jar & incl all)

Read:
scala> val df = spark.sqlContext.read.format("com.samelamin.spark.bigquery").option("tableReferenceSource","projectid:schema.table").load()
java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V
at com.google.cloud.hadoop.io.bigquery.BigQueryStrings.parseTableReference(BigQueryStrings.java:68)
at com.samelamin.spark.bigquery.BigQueryRelation.getConvertedSchema(BigQueryRelation.scala:19)
at com.samelamin.spark.bigquery.BigQueryRelation.schema(BigQueryRelation.scala:13)
at org.apache.spark.sql.execution.datasources.LogicalRelation.(LogicalRelation.scala:40)
at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:389)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)

Write:
scala> avrodf.saveAsBigQueryTable("projectid:schema.table")
java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)V
at com.google.cloud.hadoop.io.bigquery.BigQueryStrings.parseTableReference(BigQueryStrings.java:68)
at com.samelamin.spark.bigquery.BigQueryDataFrame.saveAsBigQueryTable(BigQueryDataFrame.scala:40)
... 50 elided

guava: 26.0

jdk1.8

Please help us. This seems to be a great api. very promising to use and easiness.

This sounds like a bug that was fixed with the latest release, can you confirm what version you are using?

2.6.0 also has this error. ShowStopper.

I was able to get this resolved by shading the google libraries.

 <configuration>
   <relocations>
     <relocation>
       <pattern>com.google</pattern>               
         <shadedPattern>shaded.guava</shadedPattern>
           <includes>
             <include>com.google.**</include>
           </includes>
           <excludes>
             <exclude>com.google.common.base.Optional</exclude>
             <exclude>com.google.common.base.Absent</exclude>
             <exclude>com.google.common.base.Present</exclude>
             <exclude>com.google.cloud.**</exclude>
           </excludes>
     </relocation>
   </relocations>
 </configuration>

Cheers for adding the example @ameyamahajan I would really appreciate it if you add a ToDo section in the readme :)