lichess-org/fishnet

fairy-stockfish timeouts in 2.8.1

Stecors opened this issue · 4 comments

Since the 2.8.1 update, I have been seeing occasional worker crashes. That did not happen on 2.7.1, which I had been running 24/7 on a server for weeks.

Examples:

2024-01-05 21:49:00 W: Fairy-Stockfish timed out in worker 2.
2024-01-05 21:49:02 W: Fairy-Stockfish timed out in worker 3.
2024-01-05 21:49:02 W: Fairy-Stockfish timed out in worker 0.
2024-01-05 21:49:26 W: Fairy-Stockfish timed out in worker 1.

arch: x86_64-unknown-linux-musl
The same error occurs with the new parameter --cpu-priority unchanged.

Hi, thanks for reporting. Can you please try the current development version (or binary snapshots from https://github.com/lichess-org/fishnet/actions/runs/7432495919) to see if 9f1a110 fixes the issue?

Had the same issue on windows. Seems to be working better with 9f1a110 but experiencing what seems to be lower nodes per sec sitting closer to 4-6k nps from before 2.8 changes from 7-10knps 5800x cpu

I have let 2.8.2-dev run overnight. Even though there were only a handful of fairy-stockfish jobs, I haven't seen any timeouts anymore. Thanks for the quick fix.

Thank you both.

For nps, since it is measured as the nodes of real positions (excluding the newly introduced chunk overlap) divided by the total time taken for the whole batch, a ~20% drop is expected. The degree of parallelism also varies much more, now, so there's more variance in this measurement. We could measure something smoother like nodes per CPU time, but ultimately wall clock time is what's relevant for the user experience.