tikv/client-go

replica selector refactor

crazycs520 opened this issue · 2 comments

Background

Currently, the logic of replicaSelector is too complicated, and the code maintenance is a heavy burden. I guess no one can explain the common replica selection logic without looking at the code.

I think that the reason why replicaSelector is so complicated is because of the introduction of a state machine, which has many states and the transitions between states are also messy. The following is the state machine transition during the tidb_replica_read = leader strategy compiled by @zyguan, this is great, but still complicated, and this doesn't consider the situation when enable-forwarding is true, otherwise, it will be more complicated.

image

Task

This task is trying to refactor replicaSelector, and has the following targets:

  • Test replica selection strategies for various region error scenarios, and fix unreasonable behavior.
  • Less code, clearer logic, easier to understand.
  • Keep the replicaSelector logic/behavior the same as before.

Considering failure regression (since no one can guarantee that the new code is completely bug-free), the implementation is to introduce a new replicaSelectorV2, through the configuration file enable-replica-selector-v2 = true to use replicaSelectorV2 by default, set enable- replica-selector-v2 = false to fall back to using the older version of replicaSelector. After replicaSelectorV2 has gone through 2 major versions stably, we can start to consider deleting the old version of replicaSelector

@crazycs520
Please add descriptions about problems and design details, so others could know why and how.

- Less code, clearer logic, easier to understand.
- Keep the replicaSelector logic/behavior the same as before.

@crazycs520

Simplifying the code is not a specific business objective. Please describe specific business objectives (such as unit test coverage of all state transitions and fixing unexpected issues) and prioritize the todo tasks accordingly.