Tencent/TBase

benchmark 5.0 压测 tbase 报错

Closed this issue · 1 comments

benchmark 配置

warehouses=1000
terminals=500
runMins=10
架构

gtm master
cn01 master
cn02 master
datanode1 master
datanode2 master

cn节点的 连接参数配置

/tbase/pgxc/nodes/coord_master/postgresql.conf:max_pool_size = 2000
/tbase/pgxc/nodes/coord_master/postgresql.conf:max_connections = 2000

dn节点的连接参数配置

/tbase/pgxc/nodes/dn001/postgresql.conf:max_connections = 8000
/tbase/pgxc/nodes/dn001/postgresql.conf:max_pool_size = 8000

/tbase/pgxc/nodes/dn002/postgresql.conf:max_connections = 8000
/tbase/pgxc/nodes/dn002/postgresql.conf:max_pool_size = 8000

报错信息

11:13:17,498 [Thread-295] ERROR jTPCCTData : ERROR: node:dn001, backend_pid:19703, nodename:dn001,backend_pid:19703,message:Failed to get pooled connections
Hint: This may happen because one or more nodes are currently unreachable, either because of node or network failure.
Its also possible that the target node may have hit the connection limit or the pooler is configured with low connections.
Please check if all nodes are running fine and also review max_connections and max_pool_size configuration parameters
11:13:17,498 [Thread-55] ERROR jTPCCTData : Unexpected SQLException in STOCK_LEVEL

查看数据库 pg_stat_activity 表显示有sql请求被阻塞 ,

postgres=# select * from pg_prepared_xacts;
transaction | gid | prepared | owner | database
-------------+--------------------------+-------------------------------+-----------+-----------
966365 | _$XC$1529356:cn001:F:2:0 | 2022-04-01 16:13:33.979825+08 | benchmark | benchmark
965853 | _$XC$1528537:cn001:F:2:0 | 2022-04-01 16:13:33.982607+08 | benchmark | benchmark
966380 | _$XC$1529348:cn001:F:2:0 | 2022-04-01 16:13:33.982619+08 | benchmark | benchmark
966381 | _$XC$1529314:cn001:F:2:0 | 2022-04-01 16:13:33.989759+08 | benchmark | benchmark
966199 | _$XC$1529014:cn001:F:2:0 | 2022-04-01 16:13:33.990128+08 | benchmark | benchmark

问题

1 这个 报错Failed to get pooled connections 提示需要修改连接参数但是连接参数已经比较大了,还要怎么修改?

2 对于性能压测 有没有关于数据库 分布式事务的参数?

1、TPCC推荐配置:
如三台机器16core + 64G,则tpcc的参数配置可以是
conn=jdbc:postgresql://192.168.0.2:11379,102.168.0.3:11381,192.168.0.4:11379/global?loadBalanceHosts=true&oracle_compile=true
warehouses=500
loadWorkers=32
terminals=96

内核参数:
persistent_datanode_connections = 'on'
enable_material = 'off'
enable_bitmapscan = 'off'
max_wal_size = '12GB'
shared_buffers = '16GB'
checkpoint_timeout = '600'
min_wal_size = '4GB'
pooler_scale_factor = '64'
archive_status_control = 'continue'
maintenance_work_mem = '4GB'
effective_cache_size = '50GB'
max_parallel_workers_per_gather = '0'
max_pool_size = '65535'
work_mem = '8MB'
wal_keep_segments = '4096'

2、针对你说的 select * from pg_prepared_xacts; 查询到的是2PC事务中处于prepare阶段的事务,在分布式系统中属于正常会自动结束,如果某个事务长时间不结束可能存在2PC残留的情况,可参考下面的材料进行人工自动清理:
https://github.com/Tencent/TBase/wiki/11-v2.3.0%E5%8D%87%E7%BA%A7%E7%89%B9%E6%80%A7pg_clean%E4%BD%BF%E7%94%A8%E8%AF%B4%E6%98%8E