powa-team/powa

Should we take into account pg_qualstats.sample_rate when setting pg_qualstats.max?

yhuelf opened this issue · 4 comments

cf https://github.com/powa-team/powa/blob/master/docs/components/stats_extensions/pg_qualstats.rst#configuration

else it should be roughly equal to the number of queries executed during powa.frequency interval of time.

should it be powa.frequency * pg_qualstats.sample_rate ?

Also, what happens exactly if the max is set too low? Are the least frequent quals removed, like in pg_stat_statements?

And what is the impact of setting the max too high?

rjuju commented

pg_qualstats.sample_rate and setting pg_qualstats.max are for different purpose. sample_rate is there purely for performance reason, as inspecting the quals is quite costly. If you don't sample the queries (so configure it to 1), you can get dramatic slowdown if you have a busy environment with somewhat complex expressions in your quals.

Edit: the rule of thumb is the documentation is really, really simplistic, as it's almost impossible to infer the number of quals based on a number of queries. Trying to improve it in the documentation is probably gonna be a waste of resource.

max is only there to be able to mitigate the memory usage vs. how much you can actually sample.

So both should be configured separately, although obviously having a sample_rate to 1 and a max very low can seem incompatible. Note that if you have OLAP only workload, it could totally be a reasonable configuration.

Also, what happens exactly if the max is set too low? Are the least frequent quals removed, like in pg_stat_statements?

Yes, pg_qualstats has the same approach for counter eviction (based on usage) and setting the max too low can lead to sampled quals being evicted. Note that however powa will reset the quals for every snapshot, in order to limit the sample size. So if you use pg_qualstats with powa that's something to keep in mind.

And what is the impact of setting the max too high?

higher memory usage on the postgres server, and possibly higher disk usage for powa or other tool saving those metrics.

OK, thank you very much.

If I understand well, the rule of thumb in the documentation is based on the very simplistic assumption that there are roughly N unique quals extracted from the N queries executed during powa.frequency interval of time.

So my question was: under this simplistic assumption, should the formula be "max = N * sample_rate" ?
EDIT : if the answer is "yes", maybe you should replace "executed" by "sampled" in the rule of thumb. IMO, it would make things clearer, event though this rule of thumb isn't actually very useful.

But from what you said, I understand we should rather think in terms of memory usage impact, this previous hypothesis beeing too simplisitic to be actually useful.

Thank you again.

rjuju commented

If I understand well, the rule of thumb in the documentation is based on the very simplistic assumption that there are roughly N unique quals extracted from the N queries executed during powa.frequency interval of time.

Yes, but that's only true if you don't enable track_constant, as otherwise you get as many entries as different constant in your quals.

So my question was: under this simplistic assumption, should the formula be "max = N * sample_rate" ?

Even with track_constant disabled, I think that max shouldn't be changed depending on the sample_rate. With sampling the idea is that you eventually see everything, just less often. But eventually you will see all the queries.