reactor/reactor-netty

Webclient timeout with Httpclient

5Orange opened this issue · 10 comments

Problem statement

While sending high traffic to API, experiencing WebClient timeout but downstream still working and returning response according to timeout configuration time

Expected Behavior

it should return 200 instead of 500 and webclient timeout error

Actual Behavior

ava.util.concurrent.TimeoutException: Webclient timeout
	at********.Test.lambda$call$1(Test.java:37)
	at reactor.core.publisher.Mono.lambda$onErrorResume$32(Mono.java:3887)
	at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94)
	at reactor.core.publisher.SerializedSubscriber.onError(SerializedSubscriber.java:124)
	at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:295)
	at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:280)
	at reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:419)
	at reactor.core.publisher.FluxOnErrorReturn$ReturnSubscriber.onNext(FluxOnErrorReturn.java:162)
	at reactor.core.publisher.MonoDelay$MonoDelayRunnable.propagateDelay(MonoDelay.java:271)
	at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:286)
	at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
	at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 20000ms in 'flatMap' (and no fallback has been configured)
	... 13 common frames omitted

Steps to Reproduce:

using Jmeter to send 20 requests into GET /test API, and half of them failed

@RestController
@RequiredArgsConstructor
public class Test {

  private final WebClient.Builder webClientBuilder;

  public Mono<?> call() {
    return webClientBuilder.build().method(HttpMethod.GET)
        .uri("http://localhost:8080/ms-test-webclient/delay")
        .headers(httpHeaders -> httpHeaders.set("Content-Type", "application/json"))
        .retrieve()
        .toEntity(String.class)
        .flatMap(Mono::just)
        .timeout(Duration.ofSeconds(20))
        .onErrorResume(TimeoutException.class, ex -> {
          String message = "Webclient timeout";
          return Mono.error(new TimeoutException(message).initCause(ex));
        });
  }

  @GetMapping("test")
  public Mono<?> test() {
    return call();
  }

  @GetMapping("delay")
  @SneakyThrows
  public String delayresponse() {
    Thread.sleep(15000);
    return "success";
  }
}

Reactor version(s) used: 1.0.23

+--- org.springframework.boot:spring-webflux:5.3.31^M

| | --- io.projectreactor.netty:reactor-netty-http:1.0.39^M
JVM version (java -version): 11

@mswindowsxp I'm not able to reproduce the described problem. See the reproducible example which is a compilation from the description here and the recommendation in the stackoverflow thread.

https://github.com/violetagg/GH-3240

@mswindowsxp I'm not able to reproduce the described problem. See the reproducible example which is a compilation from the description here and the recommendation in the stackoverflow thread.

https://github.com/violetagg/GH-3240

hi @violetagg , I created a repo to reproduce, please refer this one: https://github.com/5Orange/webclient-error
video reproduce: https://www.youtube.com/watch?v=91e4DT7_rso&ab_channel=mswindows

@5Orange Stackoverflow thread already provided an answer for the reproducible example that you prepared - you cannot block the event loop ...

https://github.com/5Orange/webclient-error/blob/a375992e789eabe6ecf9965deaa7fa8bbdeb8dd1/src/main/java/com/example/demo/controller/TestController.java#L42

Reactor Netty is using just a few threads and if you block them then you will observe the behaviour that you described above.

If you need to block or you have CPU intensive operations then you need to offload.

@violetagg Vio
i updated to use Flux instead of Thread.sleep, and got the same error, it just indicate the slow response from downstream
@GetMapping("delay") @SneakyThrows public String delayresponse() { return Flux.interval(Duration.ofSeconds(15)) .next() .map(any -> "success"); }

@5Orange The project is now broken. The method should return Mono<String> and should look like this

    @GetMapping("delay")
    @SneakyThrows
    public Mono<String> delayresponse() {
        return Flux.interval(Duration.ofSeconds(15))
        .next()
        .map(any -> "success");
    }

Both my client that sends requests in parallel and curl are working. I don't have JMeter, may be the problem is there.

My client

public class ClientApplication {
	public static void main(String[] args) {
		HttpClient client = HttpClient.create().port(8080);

		System.out.println(Flux.range(0, 20)
				.flatMap(i -> client.get()
						.uri("/test")
						.responseContent()
						.aggregate()
						.asString())
				.collectList()
				.block());
	}
}

curl

curl -Z --config urls.txt

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

Can't reproduce either using either ab or drill.
Could you please provide a fully reproducible exemple, using for instance https://github.com/jmeter-maven-plugin/jmeter-maven-plugin to allow us to simply run into your issue with the same exact configuration. Otherwise it's quite impossible for us to pinpoint that very specific issue.

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open.