mrchristine/db-migration

Missing space after the keyword USING in DDL

Closed this issue · 8 comments

Hello , I recently noticed that a space is missing after the USING keyword in the DDL , below is an example.

USINGorg.apache.spark.sql.parquet
OPTIONS (
  path 'dbfs:/path'
)
PARTITIONED BY (column)

because of the missing space , the DDL execution fails. Can you please help us with this ?

Thanks for reporting this @arjun-hareendran . I just pushed a fix to remove the call to remove whitespace from the DDL. Please try again and let me know if you run into issues with the latest commit on master.

@mrchristine : I tried with the latest patch but the issue still exists .

To summarise.

  1. The missing space after USING keyword is not for all DDL , only for few , especially the usage
    USINGorg.apache.spark.sql.parquet

  2. I found out one more issue with spacing, Again this is not a frequent one but when it occurs the DDL fails.
    columnname STRINGCOMMENT

@arjun-hareendran I just pushed a change to print the offsets used for the string splicing of the DDL. Can you share the log with me, or even a sanitized DDL that I can add to my tests?

I created a 5k column table DDL and that is working on my end. I'd love to get this fixed but need more data to reproduce it.

@mrchristine : Let me route this internally and i will get back on the same. Really appreciate your efforts on this. 👍

@mrchristine : What would be the best way to send over the DDL's to you ?

@arjun-hareendran please send an email to mwc@databricks.com and we can communicate there.

@mrchristine : Thanks ,The files must be in your inbox by now.

Fixed. Removed the batching implementation and switch to file based store to remove corner cases for whitespace handling.