hirosystems/stacks-blockchain-api

Possible regression in 8.1.2

Closed this issue · 6 comments

I don't have the full details here as a middleman, but the basics are:
upgraded stacks-node and api to the latest versions (3.0.0.0.0 and 8.1.2 respectively)
after a few hours/days it was observed that postgres was using nearly 1000% cpu sustained.

none of the active queries seemed to be the reason, so we stopped the api and the stacks-node.
postgres cpu dropped back to normal.

then, we started stacks-node and the api, but kept it firewalled from external traffic: postgres cpu remained as expected (roughly 4% used).

then, to test the api a little we sent a few balance curls:

curl -H "Content-Type: application/json" http://localhost:3999/extended/v1/address/{principle}/stx

no matter the address, it takes several seconds (in some cases upwards for 30s) for the data to be returned.

there is also a corresponding CPU spike on postgres.

we're testing a downgrade to api version 7.14.1

update: downgrade to 7.14.1 seems to have greatly helped.

adding some additional details here:

$ time curl -H "Content-Type: application/json" http://localhost:3999/extended/v1/address/${PRINCIPLE}/stx
{"balance":"0","total_sent":"0","total_received":"0","total_fees_sent":"0","total_miner_rewards_received":"0","lock_tx_id":"","locked":"0","lock_height":0,"burnchain_lock_height":0,"burnchain_unlock_height":0,"estimated_balance":"0"}
real	0m36.916s
user	0m0.011s
sys	0m0.002s

and the corresponding resource usage on the host (externally hosted DB):

top - 17:13:23 up 272 days, 17:47,  5 users,  load average: 1.47, 1.30, 1.26
Tasks: 124 total,   1 running, 123 sleeping,   0 stopped,   0 zombie
%Cpu(s): 21.2 us,  5.2 sy,  0.0 ni, 73.2 id,  0.0 wa,  0.0 hi,  0.3 si,  0.2 st
MiB Mem :   7452.8 total,    691.1 free,   1417.3 used,   5344.5 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   5741.1 avail Mem 

i don't see any corresponding cpu spike on my DB (as was shared with me), but the DB also has a lot of resources available to it with no shared IO.

Same issue here. Downgrade to 7.14.1 helps a lot.

@planbetterHQ would you be able to share examples of principals used in the balance endpoint that are causing this for you?

confirm the change in #2156 seems to help here - i'm seeing much more reasonable times returning balance data locally
ex:

real	0m0.684s
user	0m0.012s
sys	0m0.000s

Fixed in v8.2.1