neondatabase/neon

storcon: split on shard physical size

Closed this issue · 2 comments

Currently, automatic shard splits only happen based on the logical size of the largest timeline.

There are pathological cases where we can get huge tenants with small timelines, which thus won't split. This can happen e.g. with lots of small timelines, or with workloads such as sqlsmith which open lots of transactions and then roll them back.

We should also have a physical shard split threshold, to avoid shards growing too large. This should also allow repeated shard splits, like --split-threshold does for logical size.

See Slack thread where this ran staging Pageservers out of disk.

jcsp commented

We should measure the max ratio of physical to logical size that we see in practice, and define "pathological" as something like the 99% percentile of that.