powa-team/pg_stat_kcache

Rusage of parallel query workers seem not be counted

atorik opened this issue · 5 comments

Thanks for developing this useful extension!
I was searching a way to track CPU usage on PostgreSQL, and this seems what I want.

I'm a newbie for this extension, so I may misunderstand something, but when I run parallel query, pg_stat_kcache_detail view seems only count for the leader process and ignore workers.

Here is an example.

Query wih 5 workers:

=# SELECT pg_stat_kcache_reset();
=# set max_parallel_workers_per_gather to 5;

=# SELECT query, user_time FROM pg_stat_kcache_detail WHERE query LIKE '%MAX%';
-[ RECORD 1 ]--------------------------------------
query       | SELECT MAX(bid) FROM pgbench_accounts
user_time   | 1.1822049999999997

Query wihout workers:

=# SELECT pg_stat_kcache_reset();
=# set max_parallel_workers_per_gather to 0;

=# SELECT query, user_time FROM pg_stat_kcache_detail WHERE query LIKE '%MAX%';
[ RECORD 1 ]--------------------------------------
query       | SELECT MAX(bid) FROM pgbench_accounts
user_time   | 5.66161

Even more, when I turn parallel_leader_participation to off, 'user_time' was nealy zero.

Is this behavior intentional or not?
If not, is there any plan to summing up worker rusages?

As far as I investigated, queryId of parallel workers are assigned to 0, so their rusage are not summed up to the original query.

I once thought calculating queryId in executor_hook on workers, but the pg_stat_statements does not expose queryId calculation functions(JumbleQuery()).

Considering that parallel aware query information like BufferUsage and WalUsage are summed up in accumulation functions in PostgreSQL-core (InstrAccumParallelQuery()), I suspect it might hard to do it in extensions.

Thanks.

rjuju commented

Hello,

Thanks a lot!

Yes, it's a known limitation for which I didn't implement a workaround (yet).

I'll see if a solution similar to what was added in pg_qualstats (see powa-team/pg_qualstats@80f7e06) will work here, but I don't see any reason why it wouldn't.

I didn't do it for now, as it costs some memory and no one complained until now. Also, I was hoping to have it fixed into postgres (I have a pending patch at https://commitfest.postgresql.org/28/2069/ that would expose the queryid in parallel workers), but it unfortunately didn't make it to pg13. I'll try to see if at least the queryid passing can be added in pg14.

Thanks for your kind explanation!

I'll try to see if at least the queryid passing can be added in pg14.

+1.
I don't know if I'll be of any help, I've registered as a reviewer of this patch :)

rjuju commented

Oh, thanks a lot!!

rjuju commented

I just pushed 2ce11b2 which should fix the problem. Let me know if it also works for you!

That worked fine for me!
Thanks for your quick fix and for adding me to contributors :)