hyrise/sql-parser

Support of `HAVING` without `GROUP BY`

dey4ss opened this issue · 0 comments

While going through the Public BI Benchmark, I found multiple queries with HAVING clauses but no GROUP BY columns (e.g., [1, 2, 3]). I'd like to discuss whether we want to support such statements or not.
Related to #186.

Consider the following query [1]:

SELECT SUM(CAST("TrainsUK2_2"."Number of Records" AS BIGINT)) AS "sum:Number of Records:ok"
FROM "TrainsUK2_2" 
HAVING (COUNT(1) > 0)

According to DBFiddle [4], Postgres executes the query*, whereas we raise a syntax error due to seeing HAVING without GROUP BY columns. However, most of the HAVING criteria seem weird, as they consist of a a term always evaluating to true (see example above [1]) or false (see below [5]), assuming that the numbers passed th the aggregate functions do not refer to column IDs.

HAVING ((SUM(1) >= 30) AND (SUM(1) <= 100000))

If we want to adapt here, we could add a rule to the parser and change

opt_group : GROUP BY expr_list opt_having {
  $$ = new GroupByDescription();
  $$->columns = $3;
  $$->having = $4;
}
| /* empty */ { $$ = nullptr; };

to

opt_group : GROUP BY expr_list opt_having {
  $$ = new GroupByDescription();
  $$->columns = $3;
  $$->having = $4;
}
| HAVING expr {
  $$ = new GroupByDescription();
  $$->having = $2;
}
| /* empty */ { $$ = nullptr; };

Additionally, I noticed that the escaped column names in the example [1] do not work properly: "Number of Records" is not considered as one single identifier and "sum:Number of Records:ok" does not parse because sum is mistakenly treated as token for the SUM(...)aggregate function.

[1] https://github.com/cwida/public_bi_benchmark/blob/master/benchmark/TrainsUK2/queries/17.sql
[2] https://github.com/cwida/public_bi_benchmark/blob/master/benchmark/MLB/queries/103.sql
[3] https://github.com/cwida/public_bi_benchmark/blob/master/benchmark/Wins/queries/10.sql
[4] https://www.db-fiddle.com/
[5] https://github.com/cwida/public_bi_benchmark/blob/master/benchmark/Hatred/queries/12.sql

* SQL for reproduction:

CREATE TABLE "TrainsUK2_2"("Number of Records" smallint NOT NULL);
INSERT INTO "TrainsUK2_2" VALUES(-1);

SELECT SUM(CAST("TrainsUK2_2"."Number of Records" AS BIGINT)) AS "sum:Number of Records:ok" FROM "TrainsUK2_2" HAVING (COUNT(1) > 0);

Result:

sum:Number of Records:ok
-1