Pyspark complete code
Closed this issue · 1 comments
87sanchavan commented
Hi Sam,
This is amazing library that you built however I was not able to use in python. Can you please share complete code/example of how to use this from python/pyspark.
Regards
Sanjay
samelamin commented
Hi Sanjay
Below is a sample of reading from BQ
BQ_PROJECT_ID = "projectId"
DATASET_ID = "datasetId"
jsonFile = "/path/to/json"
GcsBucket = "gcs-bucket"
session = SparkSession.builder.getOrCreate()
bq = session._sc._jvm.com.samelamin.spark.bigquery.BigQuerySQLContext(session._wrapped._jsqlContext)
bq.setGcpJsonKeyFile(jsonFile)
bq.setBigQueryProjectId(BQ_PROJECT_ID)
bq.setGSProjectId(BQ_PROJECT_ID)
bq.setBigQueryGcsBucket(GcsBucket)
bq.setBigQueryDatasetLocation("US")
tableName = "{0}:{1}.{2}".format(BQ_PROJECT_ID,DATASET_ID,TABLE_NAME)
bqDF = session._sc._jvm.com.samelamin.spark.bigquery.BigQueryDataFrame(df._jdf)
bqDF.saveAsBigQueryTable(tableName, False, 0,None,None)