cockroachdb/cockroach

sql: SELECT ("*") parse oddities

tbg opened this issue · 4 comments

tbg commented

SELECT ("*") reproduces as SELECT (*), which isn't valid SQL.
On the other hand,
SELECT FROM t WHERE a = COUNT(*) is valid SQL.

In both cases, * is a QualifiedName{"a"} but for the first, the corresponding String() must be "*" (not *).

I've looked around for an obvious fix for this, but I'm thinking this is something that should be done at the grammar level by having * as a keyword of some sorts.

this, again, found by fuzzing. ping @dvyukov.

edit: even easier: SELECT "*" -> SELECT *.

tbg commented

I've tried to get to the bottom of this, but it's rabbit hole-ish. Basically the issue is that we have StarExpr (which is the * in SELECT *), but StarExpr isn't even an Expr so far. We use QualifiedName in most places, in particular SELECT foo.* has foo.* as a QualifiedName{"foo", "*"} so all the starsyness is completely lost. Looks like without refactoring the grammar with respect to * handling we can't really resolve this.

Maybe StarExpr and NonStarExpr should go, and instead QualifiedName should contain that information (the only crucial point is that "(ident|empty)*" and (ident|empty)* are distinguishable). If * were not an allowed column name, we'd be home free but Postgres allows that.

ping @petermattis

This isn't so much the grammar (which is parsing things fine), but the generated syntax tree. QualifiedName (which is just a []string) is inadequate at capturing some of the complexities. In particular, * is being treated as a string element where it probably deserves special handling.

tbg commented

something like this?

type QualifiedName interface {
    qualifiedName()
}
// with implementers:

type StarExpr struct {
    NonStarExpr // by convention, with Column() = ""
}
type NonStarExpr {
    // holds and exposes database, table, column
}

I don't think this is sufficient. This is an area of the postgres grammar that was significantly more complex than the vitess grammar. See the indirection and indirection_elem rules in sql.y. It looks like foo.*.* is legal for the grammar. I'm not sure where that would even be used (foo.* is usually all columns from the table foo). We're also not properly generating the syntax tree for array access, but I doubt we'll be supporting arrays any time soon.

Seems like QualifiedName needs to be something like []QualifiedElem where QualifiedElem is an interface:

type QualifiedElem interface {
  qualifiedElem()
  String() string
}

func (Name) qualifiedElem() {}
func (Star) qualifiedElem() {}
func (*Subscript) qualifiedElem() {}

type Name string

func (n Name) string() {
  if _, ok := keywords[strings.ToUpper(string(n))]; ok {
    return fmt.Sprintf(`"%s"`, n)
  }
  return encIdent(n)
}

type Star string

func (Star) String() string {
  return "*"
}