failsafe-lib/failsafe

Extract and restore state

algattik opened this issue · 3 comments

I am working on extending an open source application for distributed data exchange that uses failsafe. One thing the application does is serve as a mediator, accepting and persisting HTTP requests that it needs to relay to other instances at scale.

For this, the application persists requests to a local database (used similarly to a work queue), and moves them through a state machine. There may be hundreds of requests to dozens of remote servers waiting there to be fulfilled. If the remote server can't be reached, we need to figure out what behaviour to implement for resilience. Potentially this could involve a few cycles of fast retry, after which we just leave that request in the database in its current state, to be processed again later. For that, it might be useful to be able to extract the retry and circuit breaker state to persist it to the database.

e.g.

 Failsafe.with(retryPolicy).withStateFrom(jsonString)...

I'd look at the API slightly differently, where this is about wanting to serialize/deserialize a policy config, which is a fair enough request. For example, this could look something like:

RetryPolicy<Object> retryPolicy = RetryPolicy.builder(RetryPolicyConfig.fromJson(jsonString));

Failsafe.with(retryPolicy)...

This would allow you to create a "new" RetryPolicy, CircuitBreaker, etc, from config stored in a file. I'm not sure if that's exactly what you're after though, or if you'd want to also serialize the state of the policy as well? For example, a CircuitBreaker's internal state tracks how many successes and failures have occurred, potentially over some time period. IMO, serializing config may be fair game, but serializing state may be more hassle than it's worth (since serialization formats can change, breaking things, etc).

FWIW, you should be able to serialize policy config yourself and re-create a policy if needed just using the policy.getConfig() and Policy.builder() methods.

The idea here is to serialise the state. We have items processed by N workers from a work queue, they pick up one work item, attempt I/O, and then on failure serialise it back with the necessary state information so that the next worker will not hammer the external server.

We have resolved this by implementing our own policy outside of failsafe and persisting # of retries, last retry time, circuit status etc. along with the work item. It would be nice to have this within failsafe, but I agree with you that unless others are requesting the same feature, it might be more cruft than it's worth.