MitchelPaulin/Walleye

Ideas

mcoolin opened this issue · 12 comments

Hi Mitchel,

Nice job on the engine!

Just starting to get familiar with the code. New to rust so it will take some time. So far it looks great!

No specific issue. Is thier anything you need help on?

On my initial review it does not appear that you support using any sort of opening book moves. I'm looking at another another github project that could act as a source to provide a set of opening book moves. See https://github.com/niklasf/chess-openings

I have cloned and built Walleye.

Cute chess does not seem to install as documented. But that issues will go to them.

A few other ideas:

  • break out the fen functions from board
  • record moves
  • add AI to the engine

Hey Mike, glad you were able to build it!

On my initial review it does not appear that you support using any sort of opening book moves.

This is a deliberate choice, opening book management is usually offloaded to the GUI (cute chess has an option to load a pgn file for example).

  • break out the fen functions from board
  • record moves
  • add AI to the engine
  • I think breaking up the board file is a great idea, its grown pretty large and moving the piece definitions to their own file would be nice. Though I do like the fen functions where they are.

  • Recording moves again is usually a job taken care of by the GUI.

  • As far as adding AI I'm not sure what you mean. The engine has AI via an optimized version of the min-max algorithm. Do you mean adding some type of neural network/deep learning?

No specific issue. Is thier anything you need help on?

Move generation is one thing, its pretty slow right now relative to other engines, mostly due to the fact that I use clone in a lot of places which makes the move generation code much easier to understand and debug but much slower. If you can find a way to make some optimizations there that would be great. You can check out the README for how to run benchmarks.

Mitchel,

I did not see the AI that already exists. I was refering more to the nural/deep learning AI.

I spent my evening running tests/perf and reading.

Clone is usually number 1 or 2 in the perf stats.

I'm running on linux using a i7 lenovo, 4 cpus 8 threads.

My observations so far:

  • most of the time everything seems to run on two cpu's.
  • memory usage is pretty even
  • After alot of reading and code study I'm wondering if Copy would be faster that clone as its essentially a memcopy. There are 18 Clone's in the code.
  • I'm wondering if the get moves could use threads for each of the piece types to run in paralel?
  • I want to put a summary together on the current performance
  • I noticed that the number of nodes per second really drops off fast the higher the depth goes
  • oddly depth 5 and 6 gave me the same node count

Testing results:
Debug
./walleye -T --depth=5
Searched to a depth of 5 and evaluated 5072212 nodes in 28.646725553s for a total speed of 181150 nps
Searched to a depth of 5 and evaluated 5072212 nodes in 27.975565682s for a total speed of 187859 nps
Searched to a depth of 5 and evaluated 5072212 nodes in 27.871391048s for a total speed of 187859 nps
Searched to a depth of 5 and evaluated 5072212 nodes in 28.314559684s for a total speed of 181150 nps
Searched to a depth of 5 and evaluated 5072212 nodes in 27.945402446s for a total speed of 187859 nps
sudo perf record ./walleye -T --depth=5
[sudo] password for mike:
Searched to a depth of 5 and evaluated 5072212 nodes in 28.027864331s for a total speed of 181150 nps
[ perf record: Woken up 17 times to write data ]
[ perf record: Captured and wrote 4.283 MB perf.data (111850 samples) ]

Perf
23.31% walleye walleye [.] walleye::evaluation::get_evaluation
7.64% walleye walleye [.] <core::ops::range::Range as core::iter::range::Ra
5.26% walleye walleye [.] walleye::move_generation::is_check_cords
3.97% walleye walleye [.] core::mem::replace
3.79% walleye walleye [.] <walleye::board::PieceColor as core::cmp::PartialEq>

./walleye -T --depth=6
Searched to a depth of 6 and evaluated 124132536 nodes in 742.085057102s for a total speed of 167294 nps
sudo perf record ./walleye -T --depth=6
Searched to a depth of 6 and evaluated 124132536 nodes in 746.898943079s for a total speed of 166397 nps
[ perf record: Woken up 454 times to write data ]
[ perf record: Captured and wrote 113.881 MB perf.data (2984771 samples) ]

perf
23.39% walleye walleye [.] walleye::evaluation::get_evaluation
7.53% walleye walleye [.] <core::ops::range::Range as core::iter::range::Ra
5.26% walleye walleye [.] walleye::move_generation::is_check_cords
4.14% walleye walleye [.] core::mem::replace
3.72% walleye walleye [.] <walleye::board::PieceColor as core::cmp::PartialEq>

sudo perf record ./walleye -P -S

perf
8.23% walleye walleye [.] walleye::move_generation::is_check_cords
7.59% walleye walleye [.] walleye::evaluation::get_evaluation
5.05% walleye walleye [.] <core::ops::range::Range as core::iter::range::R
4.24% walleye walleye [.] <core::slice::iter::Iter as core::iter::traits::
4.05% walleye walleye [.] <walleye::board::Square as core::cmp::PartialEq>::e

Thanks for putting this together, for accurate results though make sure you are profiling the release build (with these nps numbers I highly suspect you are profiling the debug build). For perf to still work for you properly you can instruct the compiler to keep the debugging symbols using

[profile.release]
debug = true

For why the speed drops off rapidly as depth increase thats because there are many more terminal nodes (exponentially more each level) and each of those positions gets evaluated, if you were to disable the evaluation part of the test bench you would get a much more consistent nps across depths.

I was profiling debug.

Where do I put the debug = true line?

I expect the release numbers would be better, but the items would likely stay the same.
I give it a try when I update the release entries.

You would add it here https://github.com/MitchelPaulin/Walleye/blob/main/Cargo.toml

The numbers actually do change slightly, for example is_check_cords has a large footprint in debug but has an almost unnoticeable performance impact in release mode

Release version
./walleye -T --depth=5
Searched to a depth of 5 and evaluated 5072212 nodes in 1.687552567s for a total speed of 5072212 nps

sudo perf record ./walleye -T --depth=5
[sudo] password for mike:
Searched to a depth of 5 and evaluated 5072212 nodes in 1.622877638s for a total speed of 5072212 nps
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.261 MB perf.data (6434 samples) ]

perf
51.48% walleye walleye [.] walleye::evaluation::get_evaluation
14.86% walleye libc-2.31.so [.] __memmove_avx_unaligned_erms
14.06% walleye walleye [.] walleye::move_generation::generate_moves
8.58% walleye walleye [.] walleye::move_generation::is_check_cords
2.14% walleye walleye [.] walleye::board::BoardState::move_piece
1.89% walleye walleye [.] _mi_page_retire
1.57% walleye walleye [.] walleye::move_generation::generate_moves_test

And when playing itself:
sudo perf record ./walleye -P -S

perf
27.61% walleye walleye [.] walleye::move_generation::generate_moves
18.09% walleye libc-2.31.so [.] __memmove_avx_unaligned_erms
16.91% walleye walleye [.] walleye::evaluation::get_evaluation
15.07% walleye walleye [.] walleye::move_generation::is_check_cords
2.95% walleye walleye [.] walleye::engine::alpha_beta_search
2.84% walleye walleye [.] walleye::board::BoardState::move_piece
2.44% walleye walleye [.] walleye::move_generation::rook_moves
2.03% walleye walleye [.] _mi_page_retire

Does it only play so far (e.g. number of moves)? What determines when it stops?

Does it only play so far (e.g. number of moves)? What determines when it stops?

Its set to stop after 100 moves or checkmate, it usually ends up drawing when it plays itself and so reaches the 100 move mark.

Profiling it with -PS gives a better idea of how it performs in real games, where as -T is good if you want move generation specifically, is_check_cords still seems to dominate though, which is not what I expected. Could be a good place to look for optimizations by either reducing the number of calls to it or improving the function itself.

Hi Mitchel,

Got a few things I'm trying but running into borrow checker issues,

I'm trying to run multiple threads in generate_moves

for i in BOARD_START..BOARD_END {
    for j in BOARD_START..BOARD_END {
        if let Square::Full(piece) = board.board[i][j] {
            if piece.color == board.to_move {

// spawn this
generate_moves_for_piece(
piece,
board,
Point(i, j),
&mut new_moves,
move_gen_mode,
zobrist_hasher,
);
//
}
}
}
}

I clone the board but keep getting a static error on the zobrist_hasher. Not sure how to fix that yet.

I have to understand ownership and 'static better.

My idea is running the get moves in threads should be faster.

I have a similar idea in the check code but am less sure it will help.

Mike

Hey Mike,

I do not think using multiple cores during play is allowed under the CCRL testing conditions, or at least they put the engine on a separate leader board and would prefer you use no more than one core during play. you can see this issue for more details. Usually you are allowed more than one thread but these need to be "light weight" threads, usually just waiting to be waken up by input and using additional threads for move generation would have a noticeable multi-core CPU footprint.

For learning purposes adding multi threading is a great idea but unfortunately I would not be able to include the results in Walleye.

Also a quick note:
Where you are planning on spawning the thread will spawn hundreds of thousands of threads to find a move, you will probably need to split up the work higher in the call stack if you go this route

I'm going to close this issue but feel free to keep commenting here if you have any other questions