cloudfoundry/routing-release

Gorouter should not retry more often than endpoints available

domdom82 opened this issue · 0 comments

Is this a security vulnerability?

no.

Issue

This is a summary of a Slack thread on the CF community

  • max_attempts is freely configurable now
  • Idempotent EOF errors are no longer prunable, nor failable. just retry-able
  • which means they will be retried until max_attempts is reached
  • which can be a looooong time depending on the setting
  • even though a route may only have 3 (non-working) backends, Gorouter will still retry all of them up to max_attempts times

This issue proposes the following fix:

  • Make Gorouter stop retrying if attempt > min(max_attempts, num_endpoints)
  • This means Gorouter will not try more than 3 times on a three-endpoint route
  • The other part of the fix shall include clear documentation for load balancer algorithms (least-conn, round-robin at the moment). This documentation shall state that the algorithm MUST pick a different endpoint on each retry. This is to avoid accidentally trying the same endpoint every time (for example, if a purely random, non-indexed lb algorithm were added).
  • Documentation change should also include an update to the max_attempts spec property to state that Gorouter will only keep trying until either max_attempts is hit or the number of endpoints in the route are exhausted.

Steps to Reproduce

  1. Set max-attempts to 10
  2. Deploy an app with 3 non-working endpoints (e.g. process health check, non-listening app process)
  3. Curl the app

Expected result

failed_attempts:3 in access log

Current result

failed_attempts:10 in access log

Possible Fix

See above issue description.