liwp/again

Retry state

Closed this issue · 8 comments

cddr commented

Has anyone given much thought to how one might expose retry-state to users of this library? For example it would be nice to know if you're on the last retry so that if a failure still occurs you can mark it as permanently failed in whatever external storage you're using.

liwp commented

I haven't really thought of that. I think our use cases have all been pretty simple so far, so observing the retry state has not been a requirement for us.

Since the goal of the API is to allow you to simply wrap any set of forms in with-retries, the retry state would have to be exposed as some var that's in scope within the set of forms inside with-retries. I'm hesitant to do this, because if your retried now depend on some again-internal vars, you can't remove the retry logic simply by removing the with-retries wrapping (the code wouldn't compile since the vars would not be in scope any more). The ideal was always that you could enable retry-logic by simply wrapping your logic in with-retries, and similarly you could disable retries by unwrapping your logic.

Do you have any other use cases? The one you provided above could be solved by persisting permanently failed when with-retries throws an exception. I can't really see any reason why you would need to know the next delay (or even if there is a next delay) apart from logging purposes.

cddr commented

The one you provided above could be solved by persisting permanently failed when with-retries throws an exception

In the case I'm thinking of, a failure wouldn't be considered permanent until it's happened a few times. For example it could just be that a service is temporarily down, or some data we expect to exist isn't there yet but will be if we try again later.

I realized we could just wrap (with-retries...) in a future but then there'd be a thread for all tasks currently being managed by with-retries. This could be solved by something like manifold or core.async but I can understand if that's not a dependency you want add. I think we can re-use the code for generating backoff strategies though so that's cool

devth commented

apart from logging purposes

I really want to be able to log retries so I know how my system is doing.

liwp commented

@devth thanks for the feedback!

Maybe something like this would work for you:

(try
  (again/with-retries
    [100 1000 10000]
    (try
      (my-operation arg-1 arg-2)
      (catch Exception e
        (log "my-operation failed:" e)
        (throw e))))
  (catch Exception e
    (log "permanently failed:" e)))

If you rely wanted to, you could keep track of the remaining retries yourself:

(let [retry-count (atom 0)]
  (again/with-retries
    [100 1000 10000]
    (try
      ...
      (catch Exception e
        (swap! retry-count inc)
        (throw e)))))

Note: These are workarounds. I totally acknowledge that the library is missing features related to logging, metrics and monitoring, and I hope to get around to implementing something on that front, but to be honest I don't have a lot of time to spend on this library at the moment (PRs welcome!), so it might take a while. Also, I'm a bit vary of compromising the simple API.

devth commented

@liwp cool, thanks for the workaround suggestion!

You could move the logging into the retrier, and out of the job.

If the retrier allowed the user to pass callback that is called after each failure?

Some pseudocode:

(again/with-logging (fn [ex next] (log/infof ex "Failed due to exception - retrying after %dms" next))
  (again/with-retries [1 10 100]
    ...))
liwp commented

@marcomorain, I'm in the process of addressing this properly. See the discussion here and this branch. Please comment on the PR if you have any feedback.

liwp commented

I think the callback facility caters for the use cases mentioned in this thread. If you feel I haven't addressed something please open a new ticket.