Possible regression in 8.1.2
Closed this issue · 6 comments
I don't have the full details here as a middleman, but the basics are:
upgraded stacks-node and api to the latest versions (3.0.0.0.0 and 8.1.2 respectively)
after a few hours/days it was observed that postgres was using nearly 1000% cpu sustained.
none of the active queries seemed to be the reason, so we stopped the api and the stacks-node.
postgres cpu dropped back to normal.
then, we started stacks-node and the api, but kept it firewalled from external traffic: postgres cpu remained as expected (roughly 4% used).
then, to test the api a little we sent a few balance curls:
curl -H "Content-Type: application/json" http://localhost:3999/extended/v1/address/{principle}/stx
no matter the address, it takes several seconds (in some cases upwards for 30s) for the data to be returned.
there is also a corresponding CPU spike on postgres.
we're testing a downgrade to api version 7.14.1
update: downgrade to 7.14.1 seems to have greatly helped.
adding some additional details here:
$ time curl -H "Content-Type: application/json" http://localhost:3999/extended/v1/address/${PRINCIPLE}/stx
{"balance":"0","total_sent":"0","total_received":"0","total_fees_sent":"0","total_miner_rewards_received":"0","lock_tx_id":"","locked":"0","lock_height":0,"burnchain_lock_height":0,"burnchain_unlock_height":0,"estimated_balance":"0"}
real 0m36.916s
user 0m0.011s
sys 0m0.002s
and the corresponding resource usage on the host (externally hosted DB):
top - 17:13:23 up 272 days, 17:47, 5 users, load average: 1.47, 1.30, 1.26
Tasks: 124 total, 1 running, 123 sleeping, 0 stopped, 0 zombie
%Cpu(s): 21.2 us, 5.2 sy, 0.0 ni, 73.2 id, 0.0 wa, 0.0 hi, 0.3 si, 0.2 st
MiB Mem : 7452.8 total, 691.1 free, 1417.3 used, 5344.5 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 5741.1 avail Mem
i don't see any corresponding cpu spike on my DB (as was shared with me), but the DB also has a lot of resources available to it with no shared IO.
Same issue here. Downgrade to 7.14.1 helps a lot.
@planbetterHQ would you be able to share examples of principals used in the balance endpoint that are causing this for you?
confirm the change in #2156 seems to help here - i'm seeing much more reasonable times returning balance data locally
ex:
real 0m0.684s
user 0m0.012s
sys 0m0.000s
Fixed in v8.2.1