honeycombio/refinery

Add span count limit as an alternative way to kick off trace evaluation

Closed this issue · 0 comments

Is your feature request related to a problem? Please describe.

Sometimes, Refinery will get a giant trace with tens of thousands of spans.

This is a good way to crash Refinery because its caches are based on trace count, not span count, and because of trace locality only one Refinery will have an issue. This causes stress issues and can even crash Refinery.

Describe the solution you'd like

Allow a configuration option next to TraceTimeout -- let's call it SpanLimit -- which, when we add a span to a trace and increase that trace's span count above the limit, immediately marks the trace as ready for decision.

This will force a trace to be decided when it gets "too big", and prevent a trace from becoming unmanageably big.

We want a configuration value to avoid breaking existing customers.

Describe alternatives you've considered

Crashing.

Additional context

Honeycomb's prod refinery was crashing today, possibly because of this.