pingcap/tiflow

CDC lag up to 7min when injecting ha-pdleader-io-delay-1s-last-for-5m, though pd leader transferred soon

Opened this issue · 0 comments

What did you do?

  1. TiDB cluster with CDC changefeed running normally
  2. Inject ha-pdleader-io-delay-1s-last-for-5m (from 2024-09-04 12:37:58 to 22024-09-04 12:42:58)
  3. Check cluster status and CDC lag

What did you expect to see?

CDC lag should be <2min

What did you see instead?

PD leader transfer after chaos injection.
But CDC didn't have leader for ~5min, and CDC lag up to ~7min

2024-09-04 12:38:01	
{"container":"pd","log":"[raft.go:771] [\"646d794e12a46726 became leader at term 4\"]","namespace":"uds-cdc-br-scenario-tps-7624385-1-510","level":"INFO","pod":"upstream-pd-0"}

2024-09-04 12:38:26	
{"container":"pd","log":"[server.go:1804] [\"PD leader is ready to serve\"] [leader-name=upstream-pd-0]","namespace":"uds-cdc-br-scenario-tps-7624385-1-510","level":"INFO","pod":"upstream-pd-0"}


2024-09-04 12:38:26	
{"container":"pd","log":"[server.go:1730] [\"campaign PD leader ok\"] [campaign-leader-name=upstream-pd-0]","namespace":"uds-cdc-br-scenario-tps-7624385-1-510","level":"INFO","pod":"upstream-pd-0"}


2024-09-04 12:38:26	
{"container":"pd","log":"[server.go:1704] [\"start to campaign PD leader\"] [campaign-leader-name=upstream-pd-0]","namespace":"uds-cdc-br-scenario-tps-7624385-1-510","level":"INFO","pod":"upstream-pd-0"}

image
image
image

Versions of the cluster

/cdc version
Release Version: v8.2.0
Git Commit Hash: 498e3d3
Git Branch: HEAD
UTC Build Time: 2024-07-03 02:52:36
Go Version: go version go1.21.10 linux/amd64
Failpoint Build: false