linkerd/linkerd2-conformance

retries and timeouts: Add a new test suite for retries and timeouts

Opened this issue · 0 comments

This test is responsible for verifying if Linkerd's Retries and timeouts features are working correctly. The test shall mostly run through the instructions covered in the docs.

Setting up

  • Install booksapp sample application
  • Install the required ServiceProfiles

Retries

  • Execute the routes command to verify the success rates for various routes. These may be lower than expected due to the deliberately introduced failures (which shall be rectified with retries)
  • Enable Retries by Unmarshalling the ServiceProfile object, and setting isRetryable: true for various routes
  • Execute routes command to verify that the "effective_success" is greater than before

Timeouts

  • Testing timeouts shall work similar to Retries. The tests execute routescommand and note the value of "effective_success" for any of the routes depending on the edge selected.
  • The ServiceProfile YAML for deploy/voting is unmarshalled into a ServiceProfile object and a Timeout value is set under RouteSpec (say, "25s"). The object is then marshalled back to YAML and piped to kubectl apply
  • Finally, from the routes command must verify that the value for "effective_success"

Additionally, it would be nice to have the sample application configured to monitor the occurrences of retries and timeouts. For example, a service or a set of services may be configured to have routes dedicated for testing retries and timeouts; a service that accepts a request with 3 parameters - succeed-after-retries, id, and delay. The service returns 200 OK only after succeed-after-retries, and also keeps track of how many times the service was called before that. The service may also sleep for delay before servicing a request, as a way of validating timeouts.