xline-kv/Xline

[Bug]: CI nextest failed by timeout

Opened this issue · 3 comments

Description about the bug

There was a case that nextest failed by timeout found on github workflow. When I reruned the code without modifing any code, the failure disappeared.
It's a rare error that cannot be reproduced stably. And It can be either the code bug or a nextest bug.

Version

0.6.1 (Default)

Relevant log output

the failed CI: https://github.com/xline-kv/Xline/actions/runs/10003614333/job/27650819920?pr=905
the success(rerun) CI: https://github.com/xline-kv/Xline/actions/runs/10003614333?pr=905
The failed test is: curp::it server::shutdown_rpc_should_shutdown_the_cluster
There're a lot of RpcTransport(()) error

Code of Conduct

  • I agree to follow this project's Code of Conduct

👋 Thanks for opening this issue!

Reply with the following command on its own line to get help or engage:

  • /contributing-agreement : to print Contributing Agreements.
  • /assignme : to assign this issue to you.

That might caused by the following:

  • A put client opens a thread and sends 10 Put requests one by one.
  • Another client proposed shutdown to cluster
  • The putclient waits for a response, exceeding the time limit. now the cluster may have shut down successfully, and RPC closed.
  • The put client retries, and gets an RpcTransport error, keep retrying. Each put request may retry 3 times, 10.5s.
  • each 10 put request retry for at most 105s, exceed 30s test timeout.

This should be fixed when #918 merge.

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 14 days.