tconbeer/sqlfmt

Parsing error on valid SQL using string literals with single quotes

Opened this issue · 2 comments

Describe the bug
sqlfmt fails to parse valid SQL when using string literals that contain single quotes.

To Reproduce

select '''' || 'quoted_text' || ''''

Expected behavior
sqlfmt runs without errors.

Actual behavior

1 file had errors while formatting.
0 files left unchanged.
test.sql
    sqlfmt encountered an error: Could not parse SQL at position 35: '''

Additional context
Reproducible from https://sqlfmt.com/

These two are formatted without errors by sqlfmt, so it's only some specific combinations that break the parsing:

select '''quoted_text'''
select '''' || 'quoted_text'

My guess is we're parsing this as a triple-quoted expression followed by a single quote. We get this right (postgres dollar sign quotes):

select $$'$$ || 'quoted_text' || $$'$$

I thought triple quotes for string literals were more common, but a quick search of pg, mysql, sql server, snowflake, and bq shows that only bigquery allows them. maybe we should pull those out into a bq-specific dialect.

Hey! Is there a date for when this will be fixed? Having the same issue here. And, I'd guess it's potentially recurrent, since the replacement of single quotes is a quite common data quality practice.
Quick Examples:

  • in snowflake -> "select '''3229''' as test, REPLACE(test, '''', '') as new",
  • in sql server syntax select '''string''' as test, replace('''string''','''','') as test2, etc.
    Thanks! :)

My guess is we're parsing this as a triple-quoted expression followed by a single quote. We get this right (postgres dollar sign quotes):

select $$'$$ || 'quoted_text' || $$'$$

I thought triple quotes for string literals were more common, but a quick search of pg, mysql, sql server, snowflake, and bq shows that only bigquery allows them. maybe we should pull those out into a bq-specific dialect.