samelamin/spark-bigquery

Pyspark complete code

Closed this issue · 1 comments

Hi Sam,
This is amazing library that you built however I was not able to use in python. Can you please share complete code/example of how to use this from python/pyspark.

Regards
Sanjay

Hi Sanjay

Below is a sample of reading from BQ

BQ_PROJECT_ID = "projectId"
DATASET_ID = "datasetId"
jsonFile = "/path/to/json"
GcsBucket = "gcs-bucket"
session = SparkSession.builder.getOrCreate()
bq = session._sc._jvm.com.samelamin.spark.bigquery.BigQuerySQLContext(session._wrapped._jsqlContext)
bq.setGcpJsonKeyFile(jsonFile)
bq.setBigQueryProjectId(BQ_PROJECT_ID)
bq.setGSProjectId(BQ_PROJECT_ID)
bq.setBigQueryGcsBucket(GcsBucket)
bq.setBigQueryDatasetLocation("US")
tableName = "{0}:{1}.{2}".format(BQ_PROJECT_ID,DATASET_ID,TABLE_NAME)
bqDF = session._sc._jvm.com.samelamin.spark.bigquery.BigQueryDataFrame(df._jdf)
bqDF.saveAsBigQueryTable(tableName, False, 0,None,None)