jtdaugherty/dbmigrations

End-of-line comments causes syntax errors.

creswick opened this issue · 3 comments

The following migration fails with a parse error:

Description: Adds two tables, but has syntax errors.
Created: 2014-09-15 15:59:52.862597 UTC
Depends: 001-initial-schema
Apply:
  CREATE TABLE IF NOT EXISTS table1
  (
    -- comments work when they are on their own line.
    id SERIAL PRIMARY KEY NOT NULL,
    username TEXT NOT NULL, -- the author
    date TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
  );

  CREATE TABLE IF NOT EXISTS table2
  (
    id SERIAL PRIMARY KEY NOT NULL,
    data TEXT NOT NULL
  );

Revert:
  drop table table2
  drop table table1

Example:

$ moo test testmigration1
applying: testmigration1...
A database error occurred: execute: PGRES_FATAL_ERROR: ERROR:  syntax error at or near "CREATE"
LINE 2: CREATE TABLE IF NOT EXISTS table2 ( id SERIAL PRIMARY KEY NO...

However, this migration isn't recognized as being a migration:

Description: Adds two tables, but has syntax errors.
Created: 2014-09-15 15:59:52.862597 UTC
Depends: 001-initial-schema
Apply:
  CREATE TABLE IF NOT EXISTS table1
  (
    -- comments work when they are on their own line.
    id SERIAL PRIMARY KEY NOT NULL,
    username TEXT NOT NULL, -- the author
    date TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
  );

  -- test comment:
  CREATE TABLE IF NOT EXISTS table2
  (
    id SERIAL PRIMARY KEY NOT NULL,
    data TEXT NOT NULL
  );

Revert:
  drop table table2
  drop table table1

Example:

$ moo test testmigration2
applying: testmigration2...
No such migration: testmigration2

If you remove the ':' from the second comment in testmigration2, then a different syntax error is generated.

I added some test cases to cover the two behaviors you mentioned, but it turns out neither is a bug. Both arise from how the underlying YAML parser treats the input data.

The first issue, that of inline comments causing trouble, is due to the fact that YAML parsers do "scalar folding" of values, converting newlines to spaces, unless explicitly told that values are to be treated as literals. The reason is that in YAML, indentation indicates an increase in structure depth, and the content at that depth is the interpreted as a YAML tree, and a literal is usually not what the subsequent tree value is intended to represent. So to fix this, you have to put a pipe following the key name and colon. It also turns out the old me that wrote this code knew about this confusion and even added a note to the README about it. So an example is

foo: |
  bar
  baz

and in that example, the newline in between bar and baz will be preserved. For dbmigrations, this is almost always what you want.

As for the second behavior, since YAML uses colons as delimiters to separate keys and values, and because dashes have special meaning to denote the beginning of a node in a list, the parser interpretes "-- foo:" as YAML. Again, this arises from the lack of a pipe specifying that the value is a literal. If you add that, the problem goes away.

The other advantage to doing this in both cases is that the structure of the SQL is preserved, and that will prevent the lack of newlines from confusing the SQL parser in the database engine.

I also updated the migration serialization to emit ASCII literal markings by default in 9692cc9, so that should reduce the chance of this being bothersome.

whoa! That's a somewhat unfortunate confluence of features, and an interesting problem to think about...