heroku/heroku-pg-extras

pg_stats on interesting columns

Closed this issue · 2 comments

I'm thinking of adding something like this:

select s.*, atts.indexrelid::regclass::text as index from pg_stats s join pg_attribute a on s.attname = a.attname and s.tablename = a.attrelid::regclass::text join (select unnest(indkey) attnum, indexrelid, indrelid from pg_index i) atts on (atts.attnum = a.attnum and atts.indrelid = a.attrelid) where schemaname != 'pg_catalog' and (n_distinct != -1 or correlation < 0.95) order by schemaname, tablename, attname;

The idea here is that we see statistics on interesting columns - columns that are indexed and have a non-uniform distribution (n_distinct = -1 means that there are a number of distinct elements equal to the total number of elements....values between -1 and 0 represent a multiplier of the total number of elements, whereas positive values are absolute numbers). These values are in general more likely to matter on indexed columns. So it's a reasonable way to find interesting statistics as they may relate to problematic queries. I can think of one customer in particular that I always look at this stuff for, because their listings are particularly skewed. I think if nothing else it's useful to encourage the idea that whether or not constants appearing in your query predicate are in the most common values list matters a lot.

@deafbybeheading @fdr Would you use it if you had it? I do look at this sometimes. It is frustrating to have to work to get it, and that it isn't as accessible as I'd like. Although perhaps this isn't much more accessible.

fdr commented

I might not think to look at it as I won't think to recall it.

Given that:

Where would you fold it in? What I'm wondering is how do you model someone using or, in particular, discovering this feature, however it may wind up.

You could also chuck it into the command as a dusty corner (unless it gets surprisingly popular) and build validated ideas over time by referring (or checking) qualified customers directly to it to see how often if bears out your hypothesis. I am not a maintainer of pg-extras so maybe someone has a more qualified opinion of how inclusive it is, but my last understanding was "considerably".

I have my own idle guesses, like finding a way to in-line it with another command that's popular so people somehow get it "for free". But I don't even have an inkling where.

On Tue, Feb 25, 2014 at 10:25 PM, Daniel Farina notifications@github.comwrote:

Given that:

Where would you fold it in? What I'm wondering is how do you model someone
using or, in particular, discovering this feature, however it may wind
up.

It might be more useful as a psql macro that defines a CTE, so that I can
write ad-hoc queries against that CTE...but we can't ship that.