pg_qualstats_index_advisor functionality
Closed this issue · 14 comments
I installed the latest revision 66b3037 on my local PG 14.5 machine.
I found that you added the "docs" for the advisor here
#31
Your code example:
SELECT v
FROM json_array_elements(
pg_qualstats_index_advisor(min_filter => 50)->'indexes') v
ORDER BY v::text COLLATE "C";
fails with SQL Error [22023]: ERROR: cannot call json_array_elements on a scalar
but all I can figure out is:
SELECT pg_qualstats_index_advisor();
{
"indexes": null,
"unoptimised": [
{
"qual": "assemblies.\"orderReferenceNumber\" ~~* ?",
"queryids": [
-6084300706674462045,
-5817621241163067326
]
},
{
"qual": "orders.\"orderReferenceNumber\" ~~* ?",
"queryids": [
5707918505950049101,
9082355525220716366
]
}
]
}
taking the first query :
select pg_qualstats_example_query(-6084300706674462045);
select
*
from
"assemblies"
where
("assemblies"."orderInternalId"::text ilike $1
or "assemblies"."orderReferenceNumber"::text ilike $2)
order by
"assemblies"."id" desc,
"orderType" asc
limit 10 offset 0
I have a BTREE index for both fields (orderInternalId,orderReferenceNumber).
If I remove one of the indexes and rerun the query again and check again
SELECT pg_qualstats_index_advisor();
I get the same result as before, just with 1 more queryId.
I have a very busy schedule and just skimmed the code
it seems that
v_ddl is the part from your code example output and this is not returned.
As soon as I have some time, I will have a closer look at the function.
Hi,
The error is because the "indexes" field is "null" and not an empty array or an array containing null. I'm not sure how it can happen as the underlying variable is initialized with an empty array and skip null or empty "v_ddl". Even if the generated index DDL was null, it should still yield an array, like:
rjuju=# select json_build_object('a', array_append('{}'::text[], null));
json_build_object
-------------------
{"a" : [null]}
(1 row)
For the unoptimized quals, that's expected. The ILIKE (~~*) operator can't be automatically handled, as the index you shoud create depends on what values are used, and where are the placeholders (%) if any.
thx,
i tried to change the line to
v_indexes json[]= '{}';
Works, but I have to check why my index is empty.
When I reset the qualstats:
select pg_qualstats_reset();
and then
select pg_qualstats_index_advisor();
{
"indexes": [
[]
],
"unoptimised": null
}
Then I remove the index on the created_at
explain
SELECT *
FROM "work_orders"
WHERE "work_orders"."created_at" >= '2002-08-23 00:00:00'
and "work_orders"."created_at" <= '2022-08-24 23:59:59';
the query plan gives me
Seq Scan on work_orders (cost=0.00..1.04 rows=3 width=160)
Filter: ((created_at >= '2002-08-23 00:00:00'::timestamp without time zone) AND (created_at <= '2022-08-24 23:59:59'::timestamp without time zone))
When I execute the query
I get 2 entries in the output of
select pg_qualstats();
SELECT *
FROM push_job_logs
WHERE "push_job_logs"."updated_at" >= '2022-07-19 07:50:30.000'
and "push_job_logs"."updated_at" <= '2022-07-19 07:59:30.000';
(10,16384,3050129,17,2065,,,1012069729,2475159725,763865873,2803659205,1,3,0,0,0,0,0,0,0,0,0,64,-715090605387815437,"'2002-08-23 00:00:00'::timestamp without time zone",f)
but the advisor gives me (empty arrays with my code change) :
{
"indexes": [],
"unoptimised": []
}
I tried another query with more than 1000 entries
SELECT *
FROM push_job_logs
WHERE "push_job_logs"."updated_at" >= '2022-07-19 07:50:30.000'
and "push_job_logs"."updated_at" <= '2022-07-19 07:59:30.000';
Seq Scan on push_job_logs (cost=0.00..316.25 rows=1 width=1368)
Filter: ((updated_at >= '2022-07-19 07:50:30'::timestamp without time zone) AND (updated_at <= '2022-07-19 07:59:30'::timestamp without time zone))
Where an index would be used
but still
{
"indexes": [],
"unoptimised": []
}
SELECT dbid, amname, qualid, qualnodeid,
(coalesce(lrelid, rrelid), coalesce(lattnum, rattnum),
opno, eval_type)::public.qual AS qual, queryid,
round(avg(execution_count)) AS execution_count,
sum(occurences) AS occurences,
round(sum(nbfiltered)::numeric / sum(occurences)) AS avg_filter,
CASE WHEN sum(execution_count) = 0
THEN 0
ELSE round(sum(nbfiltered::numeric) / sum(execution_count) * 100)
END AS avg_selectivity
FROM public.pg_qualstats() q
JOIN pg_catalog.pg_database d ON q.dbid = d.oid
JOIN pg_catalog.pg_operator op ON op.oid = q.opno
JOIN pg_catalog.pg_amop amop ON amop.amopopr = op.oid
JOIN pg_catalog.pg_am am ON am.oid = amop.amopmethod
WHERE d.datname = current_database()
AND eval_type = 'f'
AND amname != ALL ('{}')
AND coalesce(lrelid, rrelid) != 0
AND qualnodeid != ALL('{}')
GROUP BY dbid, amname, qualid, qualnodeid, lrelid, rrelid,
lattnum, rattnum, opno, eval_type, queryid;
dbid |amname|qualid |qualnodeid|qual |queryid |execution_count|occurences|avg_filter|avg_selectivity|
-----+------+----------+----------+-------------------+-------------------+---------------+----------+----------+---------------+
16384|btree |3220348345|3647030479|(3050070,10,2065,f)|4635862364285714673| 2626| 10| 485| 92|
16384|brin |3220348345|2377652190|(3050070,10,2063,f)|4635862364285714673| 2626| 20| 485| 92|
16384|btree |3220348345|2377652190|(3050070,10,2063,f)|4635862364285714673| 2626| 10| 485| 92|
16384|brin |3220348345|3647030479|(3050070,10,2065,f)|4635862364285714673| 2626| 20| 485| 92|
i tried to change the line to
v_indexes json[]= '{}';
It shouldn't be necessary. I tried locally on the same version (2.0.4), and it's working as expected:
CREATE TABLE t1(id integer, val text);
INSERT INTO t1 SELECT i, 'line ' || i FROM generate_series(1, 1000000) i;
SELECT * FROM t1 WHERE val ILIKE 'meh';
SELECT * FROM pg_qualstats_index_advisor(min_filter => 50);
pg_qualstats_index_advisor
----------------------------------------------------
{"indexes" : [], "unoptimised" : ["t1.val ~~* ?"]}
(1 row)
Simulating your use case:
CREATE TABLE push_job_logs(id integer, updated_at timestamp);
INSERT INTO push_job_logs select i, '2022-01-01'::timestamp + interval '1 day' * i from generate_series(1, 100000) i;
SELECT * FROM push_job_logs where updated_at >= '2022-02-01' AND updated_at <= '2022-02-02';
id | updated_at
----+---------------------
31 | 2022-02-01 00:00:00
32 | 2022-02-02 00:00:00
(2 rows)
SELECT * FROM pg_qualstats_index_advisor(min_filter => 50);
pg_qualstats_index_advisor
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{"indexes" : ["CREATE INDEX ON public.push_job_logs USING btree (updated_at)","CREATE INDEX ON public.push_job_logs USING btree (updated_at)"], "unoptimised" : ["t1.val ~~* ?"]}
(1 row)
(clearly the redundant indexes should be removed, but still it's giving sensible results)
There's definitely something unexpected on your environment, but I don't know what.
Can you try to call pg_qualstats_reset(), set pg_qualstats.sample_rate to 1 and try again the push_job_logs test case? Maybe some of the stored quals are responsible for your problems.
Could it be that the issue is the configuration
I changed from the defaults mentioned in the docs
pg_qualstats.enabled = true
pg_qualstats.track_constants = true
pg_qualstats.max = 1000
pg_qualstats.resolve_oids = false
pg_qualstats.track_pg_catalog = false
pg_qualstats.sample_rate = 1
to
pg_qualstats.resolve_oids = true
pg_qualstats.track_pg_catalog = true
maybe
pg_qualstats.track_pg_catalog is mandatory.
Now when I delete the index
I get
{
"ddl": "CREATE INDEX ON public.push_job_logs USING btree (updated_at)",
"queryids": [
4635862364285714673
]
}
For me, it still fails if I reset the stats and execute your code example
I will keep this as this works perfectly fine
v_processed bigint[] = '{}';
v_indexes json[] = '{}';
v_unoptimised json[] = '{}';
Thx for the fast help
not the best solution but for now I will use this
select
distinct to_jsonb(v)->'ddl' as v
from
json_array_elements(
pg_qualstats_index_advisor()->'indexes') v
where
v->>'ddl' not like '%pg_catalog%';
pg_qualstats.resolve_oids = true
pg_qualstats.track_pg_catalog = true
those shouldn't have an impact.
Now when I delete the index
I get{
"ddl": "CREATE INDEX ON public.push_job_logs USING btree (updated_at)",
"queryids": [
4635862364285714673
]
}
This output comes from d203b05, so this is not the version 2.0.4 but the currently dev version 2.1.0.
It's unclear to me why you need to change the type of the arrays. Did you create any extra extension, or custom cast or something? Can you produce a script that creates a new database, create the pg_qualstats extension, some tables, add some data and reproduces the problem that I could try locally?
The issue is if you don't pass a default value to the declaration it is initialized with NULL.
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=b3f6ef8f178b471b5c9a78e0dd47c7c0
https://www.postgresql.org/docs/current/plpgsql-declarations.html
The DEFAULT clause, if given, specifies the initial value assigned to the variable when the block is entered. If the DEFAULT clause is not given then the variable is initialized to the SQL null value.
at Line 177 you add values to the default null value of the declaration, so they behave as they would have been declared with default values at the beginning.
v_ddl := '';
v_quals_todo := '{}';
v_quals_done := '{}';
v_quals_col_done := '{}';
Yes I agree that the default is NUll, but I do initialize the used v_indexes and v_unoptimized at https://github.com/powa-team/pg_qualstats/blob/master/pg_qualstats--2.0.4.sql#L447-L448:
DECLARE
[...]
v_indexes text[] = '{}';
v_unoptimised text[] = '{}';
So later, you get:
-- if underlying table has been dropped, skip this (broken) index
CONTINUE WHEN coalesce(v_ddl, '') = '';
[...]
-- and append it to the list of generated indexes
v_indexes := array_append(v_indexes, v_ddl);
So I still don't see how exactly v_indexes can be NULL.
For the variables you're showing here, they're reused in a loop, so I initialize them at the beginning of each loop.
Am I missing something, or which variable is missing an explicit initialization?
As far as I can see this is the case since 0732013, so 2 years and a half ago. Are you sure that you're using the released 2.0.4 (or 2.1.0dev) and not an old 2.0.0dev version?
sorry for distracting with the 2.0.4 version number
but I use the code from the latest commit.
in
https://github.com/powa-team/pg_qualstats/blob/master/pg_qualstats--2.1.0.sql
in line 451
v_indexes json[];
v_unoptimised json[];
PS: as well as in L12
https://github.com/powa-team/pg_qualstats/blob/master/pg_qualstats--2.0.4--2.1.0.sql
Ahhh, that explains everything! Thanks a lot for the report. I apparently broke that in 72e84f1. I will fix that shortly, and add some regression tests to make sure it doesn't get broken again.
THX for your patience :)
Works perfectly now.
Great news! I'm closing the issue, feel free to reopen it or create a new one if needed.