feat(gapic-common): Add retry options and expanded retry logic
quartzmo opened this issue · 1 comments
The google-cloud-pubsub library is required to support the following RPC retry policy, provided by the Pub/Sub team:
property | default value |
---|---|
init RPC timeout | 5s |
rpc timeout multiplier | 1.3 |
max RPC timeout | 60s |
total timeout | 600s |
initial retry delay | 100ms |
retry delay multiplier | 1.3 |
max retry delay | 60s |
This configuration is similar to the old google/cloud/pubsub/v1/publisher_client_config.json that was used prior to the introduction of this generator and gapic-common, except that in the table above "init RPC timeout" is 5s
and in the older config initial_rpc_timeout_millis
was 60000
(60s).
We need a solution that accepts values for "init RPC timeout", "max RPC timeout", "rpc timeout multiplier" and "total timeout", and uses them as suggested by their names to perform incrementally longer retries limited by a max timeout and with a check for a total time deadline. These properties must be configurable per RPC and for convenience should also be configurable as a default for the client, although that may be outside the scope of gapic-common.
I do not believe this behavior can be supported by the current retry logic in gapic-common. The options
param only contains timeout
, and the logic calls calculate_deadline
just once before beginning the retry block, and the value is never recalculated.
The value of timeout
used to compute deadline
is currently obtained from pubsub_grpc_service_config.json and is 60
.
It is actually not clear how the options
param timeout
(obtained from the service config timeout
) should be mapped to the configuration above. The service config description for the property states that it is:
The default timeout in seconds for RPCs sent to this method.
Which seems ambiguous in the context of using "rpc timeout multiplier" (above) to compute an incrementally larger timeout for each RPC retry. Taken literally, the "default timeout" should probably be "init RPC timeout", because without retries, that's the timeout that would be used for the first (and only) call. (Per @dazuma, offline.)
See nodejs-pubsub/src/v1/publisher_client_config.json and gax-nodejs/src/normalCalls/retries.ts for how this is done in nodejs-pubsub.
See googleapis pubsub/v1/BUILD.bazel for configuration inputs to other generated clients.