yugabyte/taqo

Add option to try out different index choices

mtakahar opened this issue · 1 comments

Summary

  • The framework currently produces IndexScan and IndexOnlyScan hints without specifying an index name. The optimizer may choose some index, but that is necessarily the best one to choose.
  • Since trying out every possible index on the table will increase the search space and the time it takes to run the whole model, it would be good if we have an option for enabling this mode as well as limiting it based on complexity of the query, e.g.: the number of tables <= 3, etc.
  • The IndexScan/IndexOnlyScan hints without the index name should also tried out in addition to the index specific ones to let the optimizer consider use of different indexes for computing different min/max values, etc.

Examples

Example 1
(This is included in the taqo-issues model attached to #2.)
For the query: select c2 from t10000w where c2 = 10, the optimizer chose the index t10000w_c2_c1 on t10000w (c2 asc, c1 asc), however, there is a smaller index: t10000w_c2 on t10000w (c2 asc), which would be cheaper for the query that doesn't need c1 at all.

When the option is specified, try: IndexScan(t10000w t10000w_c2_c1), IndexScan(t10000w t10000w_c2), IndexOnlyScan(t10000w t10000w_c2_c1) and IndexOnlyScan(t10000w t10000w_c2)
Instead of (or in addition to): IndexScan(t10000w) and IndexOnlyScan(t10000w) for this query. We may need to limit the number of tables in the query, e.g.: only for a single table query, etc., too.

{
    "tag": "",
    "query": "select c2 from t10000w where c2 = 10",
    "query_hash": "91a59d5ce30012088f45c690503b6707",
    "tables": null,
    "optimizer_tips": null,
    "explain_hints": "IndexScan(t10000w)",
    "execution_plan": {"full_str": "Index Scan using t10000w_c2_c1 on t10000w  (cost=0.00..4.49 rows=4 width=4)\n  Index Cond: (c2 = 10)"},
    "execution_time_ms": 0.56376953125,
    "result_cardinality": 4,
    "result_hash": "6add84506c86a658bc85038f91e35ce7",
    "parameters": [],
    "optimizations": null,
    "execution_plan_heatmap": null
}, 
{
    "tag": "",
    "query": "select c2 from t10000w where c2 = 10",
    "query_hash": "91a59d5ce30012088f45c690503b6707",
    "tables": null,
    "optimizer_tips": null,
    "explain_hints": "IndexOnlyScan(t10000w)",
    "execution_plan": {"full_str": "Index Only Scan using t10000w_c2_c1 on t10000w  (cost=0.00..4.45 rows=4 width=4)\n  Index Cond: (c2 = 10)"},
    "execution_time_ms": 0.525830078125,
    "result_cardinality": 4,
    "result_hash": "6add84506c86a658bc85038f91e35ce7",
    "parameters": [],
    "optimizations": null,
    "execution_plan_heatmap": null
},

Example 2

Example 3

  • An index-specific hint would disable min/max optimization and causes the optimizer to choose Seq Scan.
yugabyte=# explain select min(c1), min(c2), min(c3) from t100;
                          QUERY PLAN                          
--------------------------------------------------------------
 Finalize Aggregate  (cost=10.75..10.76 rows=1 width=12)
   ->  Seq Scan on t100  (cost=0.00..10.00 rows=100 width=12)
         Partial Aggregate: true
(3 rows)

yugabyte=# explain /*+ IndexOnlyScan(t100) */select min(c1), min(c2), min(c3) from t100;
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Result  (cost=12.33..12.34 rows=1 width=12)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..4.11 rows=1 width=4)
           ->  Index Scan using t100_pkey on t100  (cost=0.00..4.11 rows=1 width=4)
                 Index Cond: (c1 IS NOT NULL)
   InitPlan 2 (returns $1)
     ->  Limit  (cost=0.00..4.11 rows=1 width=4)
           ->  Index Only Scan using t100_c2_c4v on t100 t100_1  (cost=0.00..4.11 rows=1 width=4)
                 Index Cond: (c2 IS NOT NULL)
   InitPlan 3 (returns $2)
     ->  Limit  (cost=0.00..4.11 rows=1 width=4)
           ->  Index Only Scan using t100_c3 on t100 t100_2  (cost=0.00..4.11 rows=1 width=4)
                 Index Cond: (c3 IS NOT NULL)
(13 rows)

yugabyte=# explain /*+ IndexOnlyScan(t100 t100_c3) */select min(c1), min(c2), min(c3) from t100;
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Aggregate  (cost=10000000010.75..10000000010.76 rows=1 width=12)
   ->  YB Seq Scan on t100  (cost=10000000000.00..10000000010.00 rows=100 width=12)
(2 rows)

A new config parameter, all-index-check was added to allow exploration of different indexes (371732b).