Indicia-Team/warehouse

Review query planner hack

Opened this issue · 0 comments

There is a function called optimiseQueryPlan in ReportEngine.php, which refers to this thread as the reasoning: https://stackoverflow.com/questions/6037843/extremely-slow-postgresql-query-with-order-and-limit-clauses.

It’s a hack that tries to avoid a horribly slow approach the optimiser takes to some queries. It did successfully improve the performance of many queries quite considerably, but would be less necessary now that most occurrences reporting is done against ES. In addition, we have just upgraded to PG13 from PG11 which may change the situation.

An example was encountered where the hack did the opposite of what was intended and cause a query to take 2 minutes instead of 2 tenths of a second. That query was

SELECT o.*
FROM cache_occurrences_functional o
WHERE 1=1
ORDER BY o.id+0 ASC
LIMIT 3;

What is not known is whether this hack could be removed altogether now, or whether the example was a rare case of when the hack doesn’t work and it’s better left in place. In addition, we have just upgraded to PG13 from PG11 which may change the situation.

John says
We could probably test this quite easily by simulating a few queries that showed the problem once everything is in place. From memory a good example is to select records of a rare taxa, limit to a few records and sort by ID desc. A bad query plan assumes it will find all the records quickly if it starts at the end of the table and does an index scan backwards. A good plan would realise the records are distributed, so use an index to find the records of that taxon, then walk backwards to find the required limit.