Export FS must derive from GoogleHadoopFileSystemBase
Closed this issue · 1 comments
Hello,
I'm trying to process some data from BigQuery using the local cluster and then write it in hdfs. I keep getting the following error :
java.lang.IllegalStateException: Export FS must derive from GoogleHadoopFileSystemBase. at com.google.common.base.Preconditions.checkState(Preconditions.java:456) at com.google.cloud.hadoop.io.bigquery.BigQueryConfiguration.getTemporaryPathRoot(BigQueryConfiguration.java:363) at com.google.cloud.hadoop.io.bigquery.AbstractBigQueryInputFormat.getSplits(AbstractBigQueryInputFormat.java:126) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:130) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:49) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1343) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.take(RDD.scala:1337) at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1378) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.first(RDD.scala:1377) at com.samelamin.spark.bigquery.BigQuerySQLContext.bigQuerySelect(BigQuerySQLContext.scala:96)
Sample code :
`
spark.sqlContext.setBigQueryProjectId("XXXX")
spark.sqlContext.setBigQueryDatasetLocation("EU")
spark.sqlContext.setBigQueryGcsBucket("XXXXX")
spark.sqlContext.useStandardSQLDialect(true)
val table = spark
.sqlContext
.bigQuerySelect(
"""
|SELECT a, b, c
|FROM `XXX.YYYY.ZZZZ`;
""".stripMargin)
table.show
`
Thank you for your help.
Yeah that is to be expected because we need the Google file system since we stage data down to gcs then to the cluster
You can just download the required jars and create an uber jar