[Enhancement]: cpu.high 常见增加收集项目

Question

[Enhancement]: cpu.high 常见增加收集项目

Opened this issue 2 months ago · 0 comments

Description

obdiag gather scene run --scene=observer.cpu_high 采集的信息对分析ob集群 cpu占用高问题需要增加一些采集项目

1）TOP SQL信息，建议2个纬度，cpu占用和读物理盘行数。sql_audit视图有记录。
例如：
select sql_id,sum(execute_time),,sum(QUEUE_TIME), count(*), sum(MEMSTORE_READ_ROW_COUNT), sum(SSSTORE_READ_ROW_COUNT) from oceanbase.gv$ob_sql_audit where REQUEST_TIME between 1721113200000000 and 1721115000000000 group by sql_id order by 2 desc limit 10;

找到对应sql语句获取最高占用的sql的执行计划和对应sql的GV$OB_SQL_PLAN信息。

再获取对应集群级和表级分区分布以及分区里数据量是否均匀（包括二级分区）

2）gv$ob_sql_audit 信息，需要保留采集时间的全信息。

3）QPS请求分布，可以使用下面这个sql
select t2.zone, t1.svr_ip, count(*) as QPS from oceanbase.gv$ob_sql_audit t1, oceanbase.__all_server t2 where t1.svr_ip = t2.svr_ip and t1.tenant_id = 1002 and IS_EXECUTOR_RPC = 0 and request_time > (time_to_usec(now()) - 1000000) and request_time < time_to_usec(now()) group by t1.svr_ip order by QPS;

根据上诉sql 获取前几个的QPS高的节点IP，查看QPS和计划类型

select plan_type, count(*) as QPS from oceanbase.gv$ob_sql_audit t1, oceanbase.__all_server t2 where t1.svr_ip = t2.svr_ip and t1.tenant_id = 1002 and IS_EXECUTOR_RPC = 0 and request_time > (time_to_usec(now()) - 1000000) and request_time < time_to_usec(now())
and t1.svr_ip='xx.xx.xx.xx'
group by plan_type order by 2 desc ;

4）检查占用高表的表结构分区类型是否有函数分区方式，例如to_days(分区键) ，如果有可能是路由不生效导致，并检查obproxy日志是否有分区解析失败信息。