steamclock/netable

Better suppression of retrying on some errors

Opened this issue · 0 comments

Right now we are pretty expansive in what errors we will retry on.

In the discussion on #71 we ended up adding retying for a type of error that didn't totally make sense, just because other errors that didn't make sense in the same way were already retried and it seemed bad to be inconsistent about that.

Right now there are four types of errors WRT retrying:

  1. Have to be hard suppressed for things to work properly: Cancellation (otherwise you can't cancel requests at all)
  2. Always suppressed, no way to turn on, but might conceivably be useful in some configurations: Timeouts (because requests would get too long if we allowed 'em, but retying could be valid with shorter timeouts and/or known-to-be-flaky servers)
  3. Retried by default, but questionable: JSON decoding errors, finalization errors (unlikely to be useful, because this is more likely to be a logic bug that is not going to be fixed by a retry).
  4. Retried by default, and probably always want to for retying to be a valuable feature: Network errors

Both 3 and 4 CAN have any individual errors disabled using the allowRetry handler in the configuration. 1 and 2 can never be re-enabled. Basically right now, 2 is treated the same as 1 and 3 is treated the same as 4.

Ideally we should introduce a new middle category that is more like "Errors that do not retry by default, but that retrying can be enabled on". That would give the best behaviour out of the box, while also allowing for tweaking the reties in specific circumstances.

Probably the way to implement this would be to have a default implementation of allowRetry that can be replaced by a user one (and be called easily by the user handler, so you don't need to re-impliment the basic suppression if you re just adding a new error to suppress retrying on), rather than the default being "allow everything"