powa-team/pg_stat_kcache

Return inexact result if nested query is executed

mikecaat opened this issue · 28 comments

Thanks for developing the useful extension.

I tried to track the nested statement's usages but I can't get them correctly.
I want to know the following behavior is expected or not and if there is any workaround.

Describe the behavior
The counter doesn't show the expected value If executing the nested statements like PL/pgSQL.

The nested statement's example is the following.

CREATE OR REPLACE FUNCTION plpgsql_cpu()
  RETURNS void AS $$
BEGIN
  -- consume too much cpu resources
  PERFORM i::text as str from generate_series(1, 10000000) as i;
  PERFORM md5(i::text) as str from generate_series(1, 10000000) as i;
END
$$ LANGUAGE plpgsql;

SELECT plpgsql_cpu();

The views of pg_stat_kcache show the following.

I expected that the top-level function's counter is accumulated, but it's not be accumulated in pg_stat_kcache_detail.
Since pg_stat_kcache is just the sum of the pg_stat_kcache_detail, the pg_stat_kcache counters might be not correct, isn't it?

postgres=# SELECT query,user_time,system_time FROM pg_stat_kcache_detail ;
-[ RECORD 2 ]-------------------------------------------------------------
query       | SELECT md5(i::text) as str from generate_series($1, $2) as i
user_time   | 10.684798
system_time | 0.395946
-[ RECORD 3 ]-------------------------------------------------------------
query       | SELECT plpgsql_cpu()
user_time   | 11.123643999999999     // If accumulated this must be bigger than 16.1(10.6 + 5.5)
system_time | 0.463917
-[ RECORD 5 ]-------------------------------------------------------------
query       | SELECT i::text as str from generate_series($1, $2) as i
user_time   | 5.515042
system_time | 0.253732

postgres=# SELECT user_time,system_time FROM pg_stat_kcache ;
-[ RECORD 1 ]----------
user_time   | 27.328341     // top-level statements and nested query might be counted in double
system_time | 1.114047

Since pg_stat_statements supports the nested query, the counter of the top-level query is accumulated.

postgres=# SELECT query,total_exec_time FROM pg_stat_statements ;
-[ RECORD 2 ]---+------------------------------------------------------------------
query           | SELECT plpgsql_cpu()
total_exec_time | 17912.331461
-[ RECORD 5 ]---+------------------------------------------------------------------
query           | SELECT i::text as str from generate_series(1, 10000000) as i
total_exec_time | 5847.201838
-[ RECORD 1 ]---+------------------------------------------------------------------
query           | SELECT md5(i::text) as str from generate_series(1, 10000000) as i
total_exec_time | 11219.154150999999

If my understanding is correct, this extension doesn't support the nested query now.

Since this extension didn't track the nested level, if a nested query is executed,
the counters of top-level statements are tracked from the start point of the last nested query in them to the endpoint of the statements.

If there is any workaround or do you have any plan to support it, please let me know.

Tested environments

  • PostgreSQL: v14 devel 97b61448262eae5e1b4a631aeac63b11d902a474
  • pg_stat_kcache: 1ab528f
  • Ubuntu: 20.04.1 LTS (Focal Fossa)

Best regards.

rjuju commented

Thanks a lot for the report!

This is entirely totally broken wrt. nested statement, and there's no workaround with the extensions as-is. Indeed, since the counters for the beginning of the execution are stored in a global variable, each nested execution overwrites it. So only the queries executed at the deepest nested level should have correct values.

This can probably be fixed by storing an array of counters, one per nesting level. This however raises some questions on what we want to expose. I think the best would be to store 2 full sets of counters: one for the "self" time (that doesn't include deeper nesting level), and one for the accumulated one, similar to pg_stat_user_functions. This way pg_stat_kcache can correctly aggregate the values.

I'm also wondering if we should add a new option to choose whether to track nested queries or not, and while at it add support for utility commands, which are totally ignored by this extension.

What do you think?

Thanks for your quick response! I understood there is no workaround and need to fix it.

This can probably be fixed by storing an array of counters, one per nesting level. This however raises some questions on what we want to expose. I think the best would be to store 2 full sets of counters: one for the "self" time (that doesn't include deeper nesting level), and one for the accumulated one, similar to pg_stat_user_functions. This way pg_stat_kcache can correctly aggregate the values.

Nice, I agree with your thoughts.

I'm also wondering if we should add a new option to choose whether to track nested queries or not, and while at it add support for utility commands, which are totally ignored by this extension.

As a user, it is better to prepare parameters with reference to "pg_stat_statements" extension because many users must use it. And I'm glad if this extension supports tracking the utility command (copy, vacuum, create and so on) supported in "pg_stat_statements" extension.

Why don't you add the following parameters like "pg_stat_statments"?

  • pg_stat_kcache.track : the level to track which is top, all or none
  • pg_stat_kcache.track_utility : if track utility command

(By the way, since "pg_stat_statements" can track the planning phase time, some users might use if "pg_stat_kcache" supports it.)

Thanks,

rjuju commented

Thanks for the suggestion! I'll use track and track_utility.

pg_stat_statements can track planning time counters but only with pg13+. I can also add planning time counters to pg_stat_cache, however I'm not sure that it would be interesting to have different counters for planning and execution or if I should just aggregate everything in the same one. With the nested statements handling we're already doubling the number of metrics, so I'm dubious about doing a 4x.

For powa, I don't know if that would make sense, that would be really costly. Planning time's a very interesting counter, I'll love to have this to show to our python devs how much not using prepared queries costs us, but storing this for all queries all the time, that's a hefty price to pay for what I think is a small gain

rjuju commented

I wouldn't worry much about powa. Data can now be stored on a remote database, and metrics are aggregated in arrays so compression will work quite well here. If we do a 4x on the counters, we also do a 4x on the shared memory requirements for pg_stat_kcache (on top on readibility concerns, but this can be somehow addressed with views if needed).

rjuju commented

@mikecaat sorry for the late answer, I've unfortunately been a bit busy recently.

I just pushed a v2.2 branch (https://github.com/powa-team/pg_stat_kcache/tree/v2.2), which should implement most of the changes discussed here:

  • counters are now ok for nested queries
  • new pg_stat_kcache.track option
  • new pg_stat_kcache.track_planning option

I didn't implement a pg_stat_kcache.track_utility, as the query identifier isn't computed by pg_stat_statements in such case (it's only computed when the entry is stored). It could technically be possible to work around that as the queryid for utility command is only a hash of the query text, but I'd rather avoid that, as:

  • it would need to check for every pg_stat_statements release that the utility queryid algorithm didn't change
  • it would break pg_stat_kcache compatibility with 3rd party plugins that may compute queryid differently

Those may not be huge blockers, so if everyone prefer to ignore that and implement it anyway, let me know.

I also didn't implement the self/total counters. pg_stat_statements also doesn't do that, and it would make the function output really unreadable.

Edit: Note also that I hardcoded a limitation on the number of nesting level to 64. That makes the implementation simpler and should be enough for reasonable use cases.

@rjuju Thanks for implementing! You can do it at your own pace. I'm ok since I don't have any plan to use it in production right away.

I just pushed a v2.2 branch (https://github.com/powa-team/pg_stat_kcache/tree/v2.2), which should implement most of the changes discussed here:

I tried to use v2.2 branch. I find bugs and I want to ask a little bit.

There may be bugs.

  • pg_stat_kcache_reset() doesn't work correctly because the pgskCounters for execution is not initialized.(c1dc7da)

  • The description of pg_stat_kcache view in README.rst is not updated. (9fde3fb)

  • If a nested query is executed, pg_stat_kcache doesn't show expected values for users because pg_stat_kcache view is just accumulating the values of pg_stat_kcache_detail view. How about adding a nested_level column to pg_stat_kcache_detail view and pg_stat_kcache is accumulating the value of top level queries? I'm sorry if there are any misunderstood points. Anyway, I think at least to solve it or describe the limitation in README.rst.

postgres=# SELECT query,exec_user_time,exec_system_time FROM pg_stat_kcache_detail ;
-[ RECORD 1 ]----+------------------------------------------------------------------
query            | SELECT md5(i::text) as str from generate_series(1, 10000000) as i     //called from plpgsql_cpu()
exec_user_time   | 10.971502999999998
exec_system_time | 0.3359690000000004
-[ RECORD 2 ]----+------------------------------------------------------------------
query            | SELECT i::text as str from generate_series(1, 10000000) as i   //called from plpgsql_cpu()
exec_user_time   | 5.592159000000009
exec_system_time | 0.20382099999999959
-[ RECORD 3 ]----+------------------------------------------------------------------
query            | SELECT plpgsql_cpu()
exec_user_time   | 17.261084999999994
exec_system_time | 0.6676529999999996

postgres=# SELECT exec_user_time,exec_system_time FROM pg_stat_kcache ;
-[ RECORD 1 ]----+-------------------
exec_user_time   | 33.825558000000015               // The expected value is 17.261084999999994 of plpgsql_cpu() , isn't it?
exec_system_time | 1.2074870000000004 

I have some questions.

  • Isn't it better to add the comments of configuration parameters? I think some user may confuse plan_* is always zero since they don't know there is pg_stat_kcache.track_planning option.
  • Is there any reason to use PG_FINALLY() instead of PG_CATCH()? If to use PG_FINALLY(), *_nested_level-- can be written at once per function. In my understanding is correct it's ok to change it because pg_stat_statements use it and comments of PG_FINALLY() in elog.h said the following.

The cleanup code will be run in either case, and any error will be rethrown afterwards.

  • Although this is not relevant to this issue, is there any reason the unit of the time is seconds and milliseconds. I think the reason is that the unit of rusage(2) is so. But isn't it better to use the same unit as pg_stat_statements used because they must be used together?

I didn't implement a pg_stat_kcache.track_utility,

OK. I don't have any plan to use it now.
I wish postgres supports a function to make queryid.

I also didn't implement the self/total counters. pg_stat_statements also doesn't do that, and it would make the function output really unreadable.

I'm sorry that I couldn't understand the meanings of self/total counters.

Edit: Note also that I hardcoded a limitation on the number of nesting level to 64. That makes the implementation simpler and should be enough for reasonable use cases.

I think it's enough.

If you are ok, I can help to make a patch. Please let me know.
Thanks,

rjuju commented

Thanks a lot for the review!

pg_stat_kcache_reset() doesn't work correctly because the pgskCounters for execution is not initialized.(c1dc7da)

Oops indeed good catch!

The description of pg_stat_kcache view in README.rst is not updated. (9fde3fb)

Oops again, I don't know how I missed that. I still plan to do some more work on the readme though, as it should at least document what are the plan_ and exec_ prefix, the new options and the limitations.

If a nested query is executed, pg_stat_kcache doesn't show expected values for users because pg_stat_kcache view is just accumulating the values of pg_stat_kcache_detail view.

I agree it's not ideal, but that's also how pg_stat_statements behaves:

SELECT s.query, s.total_exec_time, k.exec_user_time, k.exec_system_time
FROM pg_stat_kcache() k
JOIN pg_stat_statements s USING (queryid, userid, dbid);

                            query                             |  total_exec_time   |   exec_user_time   |    exec_system_time    
--------------------------------------------------------------+--------------------+--------------------+------------------------
 SELECT plpgsql_cpu()                                         |        6870.011848 |  6.558029000000001 |               0.311952
 SELECT md5(i::text) as str from generate_series($1, $2) as i |        1769.283545 | 1.7012810000000016 |    0.06801500000000005
 SELECT pg_stat_statements_reset()                            |           0.298559 |                  0 | 0.00034800000000001496
 SELECT i::text as str from generate_series($1, $2) as i      | 4855.6508269999995 |           4.657657 |    0.19799699999999998
(4 rows)

How about adding a nested_level column to pg_stat_kcache_detail view and pg_stat_kcache is accumulating the value of top level queries? I'm sorry if there are any misunderstood points. Anyway, I think at least to solve it or describe the limitation in README.rst.

Nice idea, but as the level should be part of the entry key, I'm afraid that could lead to a lot of duplicates. Maybe just a boolean flag to indicate if it was top level or not? That would limit the duplicates and still fix the problem. +1 to document that in any case.

I'm sorry that I couldn't understand the meanings of self/total counters.

It was an idea to display the counters including and excluding underlying nested statements, similar to pg_stat_user_functions.

If you are ok, I can help to make a patch. Please let me know.

That'd be much appreciated!

@rjuju Thanks for your comments!

I make a pull request to fix a bug and update readme. Please check it.
I will try to make a patch to solve the accumulation problem based on your idea.

Nice idea, but as the level should be part of the entry key, I'm afraid that could lead to a lot of duplicates. Maybe just a boolean flag to indicate if it was top level or not? That would limit the duplicates and still fix the problem. +1 to document that in any case.

rjuju commented

Thanks a lot for the PR, that's perfect so I merged it.

For nested statements, I'm still wondering what's the best approach. Also, planning counters might be trickier to handle. Should planning of nested statements be considered as top level or not, same for nested planning of top level statements.

@marco44 and @frost242 do you have any idea or concerns about this?

sorry, I closed by mistake.

For nested statements, I'm still wondering what's the best approach. Also, planning counters might be trickier to handle. Should planning of nested statements be considered as top level or not, same for nested planning of top level statements.

I worried about that too.

The execution time of top-level statements including the planning of nested statements.
So, the patch I made is just considered a top-level statement and add a limitation comment.

Even if pg_stat_kcache.track is all, pg_stat_kcache view considers only statistics of top-level statements. So, there is the case which even though user cpu time used planning a nested statement is high, plan_user_time of pg_stat_kcache view is small. In such a case, user cpu time used planning a nested statement is counted in exec_user_time.

But, I'm not confident so I want some ideas or advice.

rjuju commented

@marco44 @frost242 last chance to give an opinion, otherwise I intend to merge the changes as discussed (and patched by @mikecaat) quite soon.

I'm ok with what's proposed, merge ! :)

rjuju commented

Great! @mikecaat could you also send a PR for mikecaat@7d24772, or should I just cherry pick it from your repository?

I'm OK too.

@rjuju Sorry for the late reply. Please check the PR.

Thanks to merge the PR. Is it ok to close this issue?

rjuju commented

Please keep it open until this is merged into master. That's helpful to keep track of it, and other users might have the same question.

rjuju commented

I did a review of the changes and fixes various issues, and I plan to merge it to master and to a release soon.

@mikecaat could you check if you're ok with the latest changes in the v2.2 branch, or if you find any other issue?

@marco44 / @frost242 feel free to review the changes too!

Thanks for preparing v2.2.0 release.
I have no comment now.

rjuju commented

I just released version 2.2.0. Thanks for everything @mikecaat, @marco44 and @frost242 !

rjuju commented

Hi @mikecaat

Did you eventually install the last pg_stat_kcache version? If yes could you check how many entries exist as both top level and nested statements? This could be helpful for integrating the same feature in pg_stat_statements (cf https://www.postgresql.org/message-id/20210309023954.tbvzds3dire5bcks@nol )

Hi @rjuju

No, I didn't try to install the pg_stat_kcache in the production because our development status is under investigation now. I'm sorry I couldn't help.

rjuju commented

Thanks for the feedback @mikecaat !

rjuju commented

For the record a similar feature has just been committed for pg_stat_statements (https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=6b4d23feef6e334fb85af077f2857f62ab781848) that will be available as of pg14.

Thanks again a lot @mikecaat!

@rjuju
Thanks a lot! I know your great works on the community
I'm happy to compute the queryid is merged too.

rjuju commented

Oh, thanks a lot @mikecaat :)