Retry state
Closed this issue · 8 comments
Has anyone given much thought to how one might expose retry-state to users of this library? For example it would be nice to know if you're on the last retry so that if a failure still occurs you can mark it as permanently failed in whatever external storage you're using.
I haven't really thought of that. I think our use cases have all been pretty simple so far, so observing the retry state has not been a requirement for us.
Since the goal of the API is to allow you to simply wrap any set of forms in with-retries
, the retry state would have to be exposed as some var that's in scope within the set of forms inside with-retries
. I'm hesitant to do this, because if your retried now depend on some again
-internal vars, you can't remove the retry logic simply by removing the with-retries
wrapping (the code wouldn't compile since the vars would not be in scope any more). The ideal was always that you could enable retry-logic by simply wrapping your logic in with-retries
, and similarly you could disable retries by unwrapping your logic.
Do you have any other use cases? The one you provided above could be solved by persisting permanently failed
when with-retries
throws an exception. I can't really see any reason why you would need to know the next delay (or even if there is a next delay) apart from logging purposes.
The one you provided above could be solved by persisting permanently failed when with-retries throws an exception
In the case I'm thinking of, a failure wouldn't be considered permanent until it's happened a few times. For example it could just be that a service is temporarily down, or some data we expect to exist isn't there yet but will be if we try again later.
I realized we could just wrap (with-retries...)
in a future but then there'd be a thread for all tasks currently being managed by with-retries. This could be solved by something like manifold or core.async but I can understand if that's not a dependency you want add. I think we can re-use the code for generating backoff strategies though so that's cool
apart from logging purposes
I really want to be able to log retries so I know how my system is doing.
@devth thanks for the feedback!
Maybe something like this would work for you:
(try
(again/with-retries
[100 1000 10000]
(try
(my-operation arg-1 arg-2)
(catch Exception e
(log "my-operation failed:" e)
(throw e))))
(catch Exception e
(log "permanently failed:" e)))
If you rely wanted to, you could keep track of the remaining retries yourself:
(let [retry-count (atom 0)]
(again/with-retries
[100 1000 10000]
(try
...
(catch Exception e
(swap! retry-count inc)
(throw e)))))
Note: These are workarounds. I totally acknowledge that the library is missing features related to logging, metrics and monitoring, and I hope to get around to implementing something on that front, but to be honest I don't have a lot of time to spend on this library at the moment (PRs welcome!), so it might take a while. Also, I'm a bit vary of compromising the simple API.
You could move the logging into the retrier, and out of the job.
If the retrier allowed the user to pass callback that is called after each failure?
Some pseudocode:
(again/with-logging (fn [ex next] (log/infof ex "Failed due to exception - retrying after %dms" next))
(again/with-retries [1 10 100]
...))
@marcomorain, I'm in the process of addressing this properly. See the discussion here and this branch. Please comment on the PR if you have any feedback.