yugabyte/taqo

Tables in the generated hints are incorrectly specified by the names instead of the aliases

mtakahar opened this issue · 1 comments

When a table in the FROM clause has an alias, the alias must be used in the hints to refer the specific instance of a table reference, however, the framework generates the hints with the table names instead.

Examples

Single table:

  • The hint should use t1.
{
    "tag": "",
    "query": "select * from t100 t1 where c2 >= 5",
    "query_hash": "43ea304b4c0146be00cbb949d9b41bd5",
    "tables": null,
    "optimizer_tips": null,
    "explain_hints": "IndexScan(t100)",
    "execution_plan": {"full_str": "Seq Scan on t100 t1  (cost=0.00..10.25 rows=92 width=4108)\n  Remote Filter: (c2 >= 5)"},
    "execution_time_ms": 1.315283203125,
    "result_cardinality": 91,
    "result_hash": "20bec5778e2124af5c07d6da59fb016e",
    "parameters": [],
    "optimizations": null,
    "execution_plan_heatmap": null
}

Join:

  • The hints should use t1 and t2.
  • This example shows another issue of adding the third instance of t100, which doesn't exist in the query, and an extra join method on it. (I can create another issue for this if necessary.)
{
    "tag": "",
    "query": "select * from t100 t1 left join t100 t2 on t1.c1 = t2.c2 and t2.c2 >= 100",
    "query_hash": "7fddb7e5d990c24cebc55448fb2e4f63",
    "tables": null,
    "optimizer_tips": null,
    "explain_hints": "Leading (( ( t100 t100 ) t100 ))  NestLoop(t100 t100) MergeJoin(t100 t100 t100) IndexOnlyScan(t100) IndexScan(t100) IndexScan(t100) ",
    "execution_plan": {
	"full_str": "Hash Left Join  (cost=4.02..14.41 rows=100 width=8216)\n  Hash Cond: (t1.c1 = t2.c2)\n  ->  Seq Scan on t100 t1  (cost=0.00..10.00 rows=100 width=4108)\n  ->  Hash  (cost=4.01..4.01 rows=1 width=4108)\n        ->  Index Scan using t100_c2 on t100 t2  (cost=0.00..4.01 rows=1 width=4108)\n              Index Cond: (c2 >= 100)"
    },
    "execution_time_ms": 0,
    "result_cardinality": 0,
    "result_hash": null,
    "parameters": null,
    "optimizations": null,
    "execution_plan_heatmap": null
},

Join on inline view:

  • The hints should use t1 and t2.
{
    "tag": "",
    "query": "select * from t100 t1 join (select * from t100 where c2 >= 100) t2 on t1.c1 = t2.c2",
    "query_hash": "2af46f870523305d1c2c48951d7ebed3",
    "tables": null,
    "optimizer_tips": null,
    "explain_hints": "Leading ( ( t100 t100 ) ) HashJoin(t100 t100) IndexScan(t100) IndexScan(t100)",
    "execution_plan": {
	"full_str": "Nested Loop  (cost=0.00..8.20 rows=1 width=8216)\n  ->  Index Scan using t100_c2 on t100  (cost=0.00..4.01 rows=1 width=4108)\n        Index Cond: (c2 >= 100)\n  ->  Index Scan using t100_pkey on t100 t1  (cost=0.00..4.11 rows=1 width=4108)\n        Index Cond: (c1 = t100.c2)"
    },
    "execution_time_ms": 0.61162109375,
    "result_cardinality": 0,
    "result_hash": "d41d8cd98f00b204e9800998ecf8427e",
    "parameters": [],
    "optimizations": null,
    "execution_plan_heatmap": null
},

Full Test case

  • Included in the taqo-issues model tarball attached: taqo-issues-model.tar.gz
  • ANALYZE step and CBO (yb_enable_statistics=on) enabled.

Note

Just the same as other SQL identifiers, mixed cases, whitespaces, keywords, etc. can be used in an alias when it is surrounded by double quotes.
e.g.:

yugabyte=# explain /*+ Leading(("Table 1" "taBle 1")) NestLoop("Table 1" "taBle 1") */select * from t100 "Table 1", t100 "taBle 1" where "Table 1".c1 = "taBle 1".c1;
                                       QUERY PLAN                                        
-----------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..24.85 rows=100 width=8216)
   ->  Seq Scan on t100 "Table 1"  (cost=0.00..10.00 rows=100 width=4108)
   ->  Index Scan using t100_pkey on t100 "taBle 1"  (cost=0.00..0.15 rows=1 width=4108)
         Index Cond: (c1 = "Table 1".c1)
(4 rows)

yugabyte=# explain /*+ Leading(("taBle 1" "Table 1")) NestLoop("Table 1" "taBle 1") */select * from t100 "Table 1", t100 "taBle 1" where "Table 1".c1 = "taBle 
[taqo-issues-model.tar.gz](https://github.com/yugabyte/taqo/files/11468625/taqo-issues-model.tar.gz)
1".c1;
                                       QUERY PLAN                                        
-----------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..24.85 rows=100 width=8216)
   ->  Seq Scan on t100 "taBle 1"  (cost=0.00..10.00 rows=100 width=4108)
   ->  Index Scan using t100_pkey on t100 "Table 1"  (cost=0.00..0.15 rows=1 width=4108)
         Index Cond: (c1 = "taBle 1".c1)
(4 rows)

Fixed by the new parser change: 371732b