MitchelPaulin/Walleye

High CPU usage across multiple cores during play

Closed this issue ยท 15 comments

Version 1.5.0 uses 2 threads on my PC. Unfortunately I noticed that only being after 240 games in my gauntlet.

Is there an issue with using two threads? To properly comply with the UCI protocol you need at least two threads since you can receive a quit, isready or stop command at any time during a search. This would not be possible with a single threaded (or single process) application.

For example, Rustic, another Rust chess engine also uses multiple threads with one being a search thread, you can see below during play its actually using 4 threads https://github.com/mvanthoor/rustic/blob/master/src/search.rs#L78

Capture

Rush is another one I have found, and it also utilizes two threads https://github.com/L-Benjamin/rush/blob/main/engine/src/engine.rs#L2

For CCRL I test on one core and, having a 4-core machine in continuous use, I cannot afford giving up a core anyway. 3 tournaments are played simultaneously and I need a free core for other things. Even that is too few sometimes.
I don't know how other engines do it but they use only 1 core (when that is set) and if they shoul happen to use more it goes unnoticed by me, looking at Process Explorer, only slight fluctuations are noticable.

I already opened this issue 1 month ago.
Anyway, engines usually work on a single thread (unless specifically requested via an UCI option) and periodically poll the input.

Alex

Reading from stdin is a blocking operation so you would need some type of thread whose job it is just to listen for input and then poll that thread for results if it read anything. Maybe other languages support non blocking I/O but I cant find any way to accomplish this in Rust.

I can try looking into why Walleye has such a large footprint if that would help it better comply with the rules, this might be preferable as Rustic seems to use multiple threads without hitting the second core too hard. There may be an issue with the search thread not quitting soon enough after a search (its possible for the engine to search to depth N + 1 when a search had already been completed and a move has already been played for depth N) or maybe having one long lived thread whose job it is for searching rather than spawning a new thread for each search - though this is less likely to help as creating threads has a relatively small overhead.

But as far as making the engine truly single threaded, I'm not sure if that's possible and I have not come across a Rust engine that is.

Velvet uses a single thread, or at least it never shows more than 1/threads% use, like Walleye.
https://github.com/mhonert/velvet-chess

Alex

@AlexBrunetti it uses multiple threads https://github.com/mhonert/velvet-chess/blob/master/engine/src/engine.rs#L69. Maybe its possible that it is forcing itself to execute on one core? I can't get velvet to compile on my system but if you check out Rustic I can see it using 4 threads during play https://user-images.githubusercontent.com/30714020/139085102-7ca172ca-f7b9-4dab-85ea-2a15c534b63b.PNG

@AlexBrunetti got it running, had to switch to the nightly build, on my system even if I set threads to 1 in the UCI options I can still see it using two threads in the system monitor

Capture

What OS are you using?

I have just checked Velevet 3.0.0 and it uses only one of my threads. I am on Windows 10.

I don't use Task Manager, I use Process Explorer instead. That shows CPU usage in %.

Hmm, I am also on Windows 10 using an Intel CPU. I downloaded process explorer and I still see each engine with two thread ids.

image

So it sounds like then the problem isn't that Walleye is using more than one thread, its that its has a high CPU usage?

Well, add "Tree CPU usage" field and watch Velvet value launching "go infinite": you'll see that it will always use the correct % (i.e. 16% if your cpu has 6 threads), Doing that with Walleye (you don't have go infinite, just let it search for a while with go wtime 900000 or so) it will use more than that, like 30%. It shouldn't occupy 90% of the single thread time to just poll an input channel, so there's something not good in your I/O implementation. You can look at Velvet's (or other Rust engines) to verify where they differ,
Being both free programs, you may also verbatim copy Velvet I/O routine, since it's not a peculiar playing module (and citing that in your source), if it works!

Alex

@AlexBrunetti yeah thats what I am thinking, and I think it has something to do with an errant search thread going on after the move has been played already, basically searching a dead position. I actually observed this recently in testing.

Another interesting thing is Walleye 1.3 actually had more or less the same I/O code and the problem did not seem to be present then. Walleye 1.3 -> 1.4 the only thing that really changed was I removed a sleep from the main thread 3fd08ad

I was a bit confused about the issue though since it explicitly mentions thread counts and all engines use multiple threads, but if I know the issue is CPU usage I think I know where the problem lies. Adding a 1ms sleep to the main thread so its not poling so often might just fix the problem outright. I'll hopefully get a chance to explore it this weekend.

Ah ok, he continues to poll and the thread goes to 100%. There's no need to poll millions of times per second! And even 1ms is exxagerated. What about 50 or 100? Then it would be an unnoticeable cpu usage.

So the issue was the "poling" thread poling way too often. Adding a sleep so the poling only happens intermittently drastically reduced the usage on the second core.

I have a hex core so you would expect using a core fully to be ~16% usage. In version 1.5.1 you can see full usage of one core by the search thread but the second thread has barely any usage at all, compare this to 1.5.0 where the second thread is using another full 16%.

1.5.1 1.5.0
walleye-new walleye-old

https://github.com/MitchelPaulin/Walleye/releases/tag/1.5.1

Let me know if version 1.5.1 fixes the issue for you

Unfortunately this executable doesn't work in my pc, it should be an AVX2/512 or BMI; I support only AVX and SSE.
Anyway, your screenshot shows it's ok now. I suppose Gabor will be satisfied, since the polling thread will run in his "free" thread unnoticeably.
Since you didn't change anything else in the engine, I don't need to download it, I've already tested 1.5.0: I usually keep two free threads to overcome this situation.
Looking forward to the next improvement. Keep up the good job!

Alex

"I suppose Gabor will be satisfied, since the polling thread will run in his "free" thread unnoticeably."

Yes, he is. I can confirm that CPU usage is now 25 %. Thanks for your efforts Mitchel.