failsafe-lib/failsafe

Regression: Failsafe 2.4.x getStageAsync may hang

timothybasanov opened this issue · 3 comments

Something has changed after 2.4.2. A combination of Timeout and RetryPolicy with getStageAsync() makes Failsafe to hang sometimes. Here is an example that's reproducible on a 2.4.x, but not on a master branch:

Timeout<Integer> timeout = Timeout.of(Duration.ofMillis(100));
RetryPolicy<Integer> retryPolicy = new RetryPolicy<Integer>()
        .withBackoff(10, 30, ChronoUnit.MILLIS)
        .withMaxRetries(2);
var result = Failsafe.with(retryPolicy, timeout).getStageAsync(() -> CompletableFuture.supplyAsync(() -> {
            try {
                Thread.sleep(500);
                return 1;
            } catch (InterruptedException e) {
                throw new RuntimeException("Interrupted");
            }
        }));
System.out.println("Result=" + result.join()); // Hangs here

It would be nice to have a fix on 2.4.x branch for people that find it may be hard to migrate to 2.5.x for some time.

Thanks for filing. 2.5.x should be an easy upgrade for most people unless they're implementing custom policies. The reason I haven't released 2.5.0 yet is because I'm trying to decide if I should skip that version and just go straight to 3.0, which ends up changing a few of the SPI things that were in flux in 2.5.0.

It doesn't look like there will be an easy way to solve this and the other Timeout related problems that were fixed in 3.0 (and the 2.5 branch) without the internal changes that went along with them. I could release a 2.5, but atm it would include a bunch of other changes: https://github.com/failsafe-lib/failsafe/blob/2.5.0/CHANGELOG.md#250, so it's probably best in your case to just move straight to 3.0 when you can.

Closing this since I don't think I'll be doing more work on the 2.x branch.