samelamin/spark-bigquery

DML query drop and create table takes time

yogesh-0586 opened this issue · 3 comments

I tried to drop and create query using runDMLQuery() but for drop and create tables it takes more than 2 minutes please check following log:

19/02/11 12:33:03 INFO com.samelamin.spark.bigquery.BigQueryClient: Executing DML Statement DROP TABLE IF EXISTS `projectId.dataset.input`
19/02/11 12:33:03 INFO com.samelamin.spark.bigquery.BigQueryClient: Using legacy Sql: false
19/02/11 12:35:15 INFO com.samelamin.spark.bigquery.BigQueryClient: Executing DML Statement CREATE TABLE IF NOT EXISTS `projectId.dataset.input` (id STRING,evnt STRING)
19/02/11 12:35:15 INFO com.samelamin.spark.bigquery.BigQueryClient: Using legacy Sql: false

Is possible to create and drop table execute in less time?

On spark-shell running DML query run fast, but on spark-submit running in yarn and cluster mode it took almost 2-3 minutes for single query

This isn't a bug but more of a performance issue, behaviour is very different between a spark shell and spark submit depending on how you are using spark submit

Best to add some form of telemetry to this ticket to validate that it's the connector not yarn or spark adding this overhead

@samelamin Thanks for your reply, I found that where was issues, it's from spark side