googleapis/gapic-generator-ruby

feat(gapic-common): Add retry options and expanded retry logic

quartzmo opened this issue · 1 comments

The google-cloud-pubsub library is required to support the following RPC retry policy, provided by the Pub/Sub team:

property default value
init RPC timeout 5s
rpc timeout multiplier 1.3
max RPC timeout 60s
total timeout 600s
initial retry delay 100ms
retry delay multiplier 1.3
max retry delay 60s

This configuration is similar to the old google/cloud/pubsub/v1/publisher_client_config.json that was used prior to the introduction of this generator and gapic-common, except that in the table above "init RPC timeout" is 5s and in the older config initial_rpc_timeout_millis was 60000 (60s).

We need a solution that accepts values for "init RPC timeout", "max RPC timeout", "rpc timeout multiplier" and "total timeout", and uses them as suggested by their names to perform incrementally longer retries limited by a max timeout and with a check for a total time deadline. These properties must be configurable per RPC and for convenience should also be configurable as a default for the client, although that may be outside the scope of gapic-common.

I do not believe this behavior can be supported by the current retry logic in gapic-common. The options param only contains timeout, and the logic calls calculate_deadline just once before beginning the retry block, and the value is never recalculated.

The value of timeout used to compute deadline is currently obtained from pubsub_grpc_service_config.json and is 60.

It is actually not clear how the options param timeout (obtained from the service config timeout) should be mapped to the configuration above. The service config description for the property states that it is:

The default timeout in seconds for RPCs sent to this method.

Which seems ambiguous in the context of using "rpc timeout multiplier" (above) to compute an incrementally larger timeout for each RPC retry. Taken literally, the "default timeout" should probably be "init RPC timeout", because without retries, that's the timeout that would be used for the first (and only) call. (Per @dazuma, offline.)

See nodejs-pubsub/src/v1/publisher_client_config.json and gax-nodejs/src/normalCalls/retries.ts for how this is done in nodejs-pubsub.

See googleapis pubsub/v1/BUILD.bazel for configuration inputs to other generated clients.