It is inspired by TCEC Season 14 - Superfinal, where Leela was trying to fry Stockfish. Stockfish is chess engine that dominates among publicly available engines since 2014. Stockfish uses classic approach of chess engines, where everything is determined by algorithms built into engine. Lc0 aka Leela Chess Zero is different, since it uses some sort of trainable neural network inside, as well as appealing human-like nickname. Mid-series, Leela managed to take the lead by 1-2 points, and there is a chance NN-powered chess engine will finish heuristics-based dominance (78/100 games played so far).
I wanted to demonstrate my 7yr-old daughter that making these computer chess players are not as hard. We drew a diagram of playing and learning with NN and I started implementing it. The idea is to have practice and fun, make own mistakes, have discussions with daughter about planning and decision making.
I understand there is ton of research done for centuries on all of the things I say below, I deliberately choose to make my own research for the fun of it.
First commits of the code. Using own representation of chess board as 8 * 8 * 12 array of 1/0 values. 1/0 are used as most distinctive input for NN about piece presence at certain square. NN uses two hidden layers 64 nodes each. Output of NN is 8 * 8 array for "from cell" and 8 * 8 array for "to cell" scores. A piece of code is used to choose best move that a) is by chess rules; b) has best score of "from" multiplied by "to" output cells. Finally, board state is checked for game end conditions of checkmate or 50-move rule.
Two copies of NN are used, one plays as White and one as Black, playing versus each other. Game is recorded as PGN file, to be able to review it by human.
Daughter decided that both NN copies urgently need girl-ish names. White is Lisa, black is Karen. I decided that writing chess rule checking and move validation is not the goal of this project. Threw away my code for it, in favor of python-chess library. This shortened code x3 times.
Engines are playing with each other, after each game they take move log and NN learns from it, assuming all moves of lost game were bad, all moves of won game are good, moves from draw are slightly bad (to motivate them search for better than draw).
A web UI is added along with small API server to be able to play with White versus Karen, to try her. Lisa vs Karen are left to play with each other and learn overnight.
They played 12700 games overnight. Watching their game recordings shows they are doing a lot of dumb moves with same piece back and forth, making no progress unless they approach 3-fold or 50-move, when program forces to discard drawish move.
It seemed that it's deeply wrong to teach NN that any move from victorious game is good. I wanted to avoid this originally, but to make progress, we need to introduce some sort of position evaluation, so engine will learn to tell good moves from bad moves.
With daughter, we outlined some rules of evaluating position:
- Pieces have different value
- Capturing enemy piece is very good
- Attacking enemy piece is good
- Putting yourself under attack or removing defence from your piece is bad
- Increasing squares under our control is good
- Moving pawn forward is slightly good as it is the way to promote
We started with material balance, taking possible square control as value of the piece:
- Pawn: 1
- Knight: 8
- Bishop: 13
- Rook: 14
- Queen: 27
- King: 64 - special value to reflect it's precious
Since we analyze game after all moves are done, we judge move by what was the consequence after opponent's move. We calculate sum of all our piece values, subtracting all enemy piece values, so we get material balance. Learning for ~3000 games shown that NNs definitely learn to capture pieces a lot, so the principle of evaluation works and reflects into NNs move score predictions. Material balance addresses #1, #2 and #3 of our evaluation rules.
Next is to take mobility into account, to cover rules #3, #4, #5, #6 from above list. It is quite easy to implement, again we take state after opponent's move and we look now many moves can we make versus how many moves are there for opponent. Attacks are counted in a similar way, we assume that if piece is under attack, it adds half of its value to mobility score. This gives us "mobility balance".
Now it gets tricky, since we need to combine material balance and mobility balance to get overall score for position. Current solution is to multiply material balance by 10 and sum with mobility balance. Gut feeling tells me it's not right, but let's see what NN will learn.
Bots are left to play overnight again.
They've played ~32000 games. Quick game versus Karen shows that still NN miss obvious attacks and loses after human quickly. But this time it does a lot more captures, moves pieces more consistenly forward, does less "dumb cyclic moves".
Watching games of Lisa vs Karen shows consistent pawn promotions and a lot of captures from both sides. Still most of obvious attacks, exchanges are missed. End game is completely dumb, falling into cyclic moves still.
I did series of experiments with NN structure, looking if making it deeper or wider would help. But ~10000 games shows that neither 3-layer, nor 128-node NN does better.
It's time to summarize what was done over past days:
- building this kind of NN-driven test engine is super-easy
- the rules we put as criteria for learning do affect engine move making
- we should research/experiment with evaluation approach more
- we should experiment with NN structure more, maybe use LSTM instead of feed-forward
- re-learning from last N games might be better than incremental learning from very last game
- turning game representation to "always one side" and using exactly same NN for game might make it more robust and universal
Decided to make sing NN that will be able to decide for both sides, using board and move flip for black.
NN structure changed to 6-layers decreasing from 768 to 128 nodes. This increased number of trainable parameters 100x times. It went back to learn games with a lot of dumb moves.
I tend to change my mind back. Actually, my program is just "what is the best move from current position" solver. It has no tree search part (intentional!), and it's wrong for NN to learn moves as good just because they are part of winning game. The score of move is mere measure of if it has changed position in our favor. At first, I want to get NN that will not do stupid mistakes.
Life of developer: was debugging code to understand why my accuracy metric of NN move quickly reaches 1.0 and stayes there. Found that I used wrong data type for numpy array, it was rounding my floats into ints. š¤¦
Still nothing leads to learning good moves. Experimenting with various scores for moves. Cyclic moves demonstrate lack of depth. I revised the criteria for good move, made NN deeper and made it to learn by batches of 10 games. Let's see...
Thinking of crystallizing "just next move predictor NN" more. For that, I'll save dataset of all moves generated through learning, and will always learn from this big set. Previous results from multilayer NN were worse than 2-layer, will get back to 2-layer for now.
I feel the trick to teach this variant of NN is to find correct board state representation, so NN will naturally learn.
I made many experiments yesterday, trying to figure out why NN still suggests dumb moves. One good thing I found is that feeding only good moves into learning process improves the NN accuracy in training, which actually makes sense. Still, the accuracy of "move from" output of NN is much better than "move to". In fact, we need "move to" as most valuable output.
Found that moves in training dataset were not unique. God knows how that was affecting learning process for NN... Also found out that material balance score were inverted for black, also affecting NN training scores. Good to find this, it gives me hope that it is possible to build NN like I want.
Big problem is dataset balancing. Moves tend to use kings more, which leads NN to learn to move kings more. Trying to re-balance learning sample weights did not help.
No matter what I did, the best accuracy I get from NN is 0.5, which is equal to random. I will try to go back to principle of "all moves from won game are good" to research it again.
I see that "noisy moves" from draws affect NN learning in a bad way. Same for pointless moves along games with victories. Those moves don't lead to victory as much. So maybe if learning moves for NN would be chosen more selectively, it would help. Will try it in the morning (00:27 now)...
Well, I experimented whole day and the result didn't change. Still stupid moves.
OpenAI has beaten humans in Dota2. This world is doomed. Let's not postpone the unavoidable, I need to find my way to make this chess engine at least a bit smart. Current idea is to introduce "intermediary goals" of "possible moves", "attacks", "threats" as auxillary output layers. Maybe even "defended pieces". Let's see if NN can learn these simple position analysis concepts.
... so, after day of coding, I've implemeted this approach. It is not clear if it is viable or not yet. The good outcome is that I've found a number of severe bugs and fixed them. Those bugs happened because of "single NN for both White and Black", a lot of confusion is there in the code to juggle sides. Cool thing is now I have visualization for model outputs. It will train overnight, though I don't expect outstanding results.
After night of training, it has reached slightly above 50% accuracy for choosing moves, but that's not much, since there are lots of repetitions and overall quality of moves is low. Still, it is better than what we had in the past.
I'll experiment today's evening with two things: a) simplify the NN back to just inputs and outputs, with no aux outs; b) train with only victories and with score increasing from 0 to 1 the closer move to the end of game.
... I need to understand how to avoid repetitions. The problem is that naturally NN falls into local optimums and suggest moves back and forth. One can't expect from this kind of non-temporal NN to work differently. So there should be a mechanism to avoid repetitions. Search tree is not an option, again. Just because whole idea of this engine is to avoid search tree.
Doing several experiments:
- Removing 3-fold repetition moves from learning data. Those moves are still present in the games, but learning input is nog garbaged by them.
- Added "eval" output of NN, teaching it to evaluate win chance of the game. Game start with each player having 0.5 and then loser gets to 0.0 while winner gets to 1.0.
- Playing back and forth with aux outputs of NN. They're still in question.
Also figured out my residual unit were increasing of size with each level. Reworked that to constant size of 8 * 8 * 12 (initial inputs)
Let me take some rest from working for new startup. Starting to change the code with following thoughts:
- Having 2 outputs 8x8 is somehow hard to train. Let's train 64*64=4096 possible moves output
- Let's switch to Chess960 aka Fischerandom for better training variety, since it is generalization over classical Chess
- Still no good solution for 3-fold and 50-move avoiding
- for now, model's structure will get back to 2-layer dense, to speed up experiments
- reading article about Q-Learning, maybe the learning strategy should change... Also on same topic: https://www.askforgametask.com/tutorial/machine-learning-algorithm-flappy-bird/
Found out that training data were using positions after move made as source of training. Oh, my... Fixing that seems helped learning a ton. Shame on me for overlooking this since the very beginning.
I also made integration with UCI and now my NN plays versus Stockfish. Loses 100% of games, of course. Let it play with SF overnight...
... it played 5000 games and dozens of re-trainings, but did not improve. I'll need to experiment more, still I see one important bug fixed.
I have interesting thought. Chess board position is effectively an image. 8x8 pixels with 12 channels. So can we apply convolutional NN to analyze this image?
I've build a NN structure that uses convolutional layers with additional route of data fed into dense layers, and now experimenting with hyperparams and training.
... it played tens of thousands of games vs SF in "retrain with last game" mode, reached only 0.47 accuracy. But I already know there is one more problem with training data, black moves were not flipped properly. Figured it out in my head just before going to sleep.
I fixed that issue with move flipping, will see how it affects the model. Looks like learning rate has improved a bit.
After some training, I see that NN does not give good moves vs SF. Will train it more.
After some more training, it has approached a bit over 0.5 accuracy, probably for the first time from all my attempts. It's all less about NN structure, it's mostly about stupid bugs in complicated code.
Still after massive training done, accuracy is barely above 0.5. Will need to experiment with NN structure now, I guess.
I decided to do one step back and start with simple 2-layer NN. And I see the accuracy close to 0.5, which means that we get better training without bugs in input data.
After good amount of training I see that accuracy converges to 0.5, which does not mean any good state.
OMG, it makes meaningful captures for the first time! That happened after I changed learning process to "learn only from moves of winning side", which I decided to do after reading some articles about how to deal with negative samples, softmax, and milticlass outputs. Though training accuracy is now around ~0.25, I like the impact on moves that I got.
As usual, I will leave it training versus SF for a night, maybe even for 48 hours. Let's see how it evolves. ... And after night of training it still converges into ~0.4 accuracy.
Having short break from primary job, I implemented more efficient way to represent possible moves output from NN: it includes only moves that are theoretically possible on board. This limits output size from 4096 into 1792 combinations, which serves well on dense type of layers.
Also, I changed structure of NN:
- 2D-convolutional branches are made for kernel size 3, 4, 5, 6, 7, 8
- concatenation of all conv branches comes into 2 aux outputs 8x8: attacked squares and defended squares
- output from attacked and defended is concatenated with original position and fed into simple 2-layer Dense part. The assumption is that knowing piece positions, attacked squares and defended squares is sufficient to make "not too dumb" moves.
- final move is decided on 1792 node output layer
This net trains quite good now, I figured out loss, metrics and activations for aux and main outputs. Categorical accuracy of "moves" output is still below 0.5, but I see sample games with some attacks and captures, which is a good sign.
Meanwhile, S16 - Division P is live on https://tcec.chessdom.com/, Leela plays versus Stockfish again...
Interesting thought: I should not use softmax on moves output layer. Just because possible moves are independent and score for them should be just independent sigmoid. Which also means that lost games can again be included in training set. There are some questions on training moves like this, since we only can pass single good/bad move as input. Ideal solution would be having own loss function.
This time, superfinal between Lc0 and Stockfish happens again, season #17. We're in self-isolation because of COVID19, it's a good time to revise my chess AI project.
I was thinking a lot about it, and many experiments were not merged into main branch, because they're all have failed. I'm ready to give up on my initial idea: build a NN that would "say" correct move. I realize that much more advanced reinforcement learning technique would be needed to achieve that.
Instead, I will try to do a "tree search with just 1 level", to pick the right move. It assumes that central part is NN position evaluator that is trusted. It's a sort of "optimist" approach where we believe that current eval is very smart and no deep tree search is needed.
...
After some training against SF, I see a problem of 1-ply depth: if a move actually leads to bad position (taking opponents' move into account), then all we have is choice from very bad moves.
Nepo plays with Carlsen for the Chess Crown (https://lichess.org/broadcast/world-chess-championship-2021/game-5/H8H4enOL). I want to try exercise on this project again, using my 3060Ti and the approach of 1-level deep eval. The idea: for each position, we calculate score for each possible next position. The top score is the move. I'm getting more modest, just want to get a model that makes not totally random moves, for example, does not miss capturing the queen for free.
I put the current version to run overnight. In the morning it got to game #960, to try all possible starting positions. Ran for 15 hours. It spends a lot of time making long endgames, so I'll add SyzygyDB checks for 3/4/5-piece endings.
It seems that with the approach of eval it is a regression task.
At the end of this exercise, I see that quality of games does not grow much, even training against SF. Maybe the NN structure could be somehow improved to provide better position analysis capacity. I still miss good point on measuring the quality of games, what's the KPI. It's not NN loss, as far as I see.
I wrote somewhere above 'bout COVID times... Well, I knew not that it can be that worse. Now I'm a voluntary exile and life won't be the same. Still, I watch TCEC season 23 superfinal and Leela loses it to SF miserably.
My mood is slightly better, and somehow I feel playing again with my chess program, to vent out my itchy engineering gland.
I understand I would repeat something from above, but my thoughts are coming again to some statements:
- I don't want the program to be the search program that gives score to a position. Instead, I want it to be a prediction program, that would suggest the next move and all logic should emerge inside NN, through its training.
- The idea of input position => NN body => 1792 possible moves output
- output layer should be
softmax
-activated categorical_crossentropy
should be a loss function
I still believe that NN should learn at least basic valid moves rules. That will be the goal of the excercise.
...
Checked again all my code for some huge stupid misuse of NN - it all looks correct according to my expectations.
Observations:
- for body activation relu is good and linear is even better
- rmsprop optimizer does not work for dense net
- judging by decrease in number of invalid moves offered by NN, training of convolutional variant helps
sigmoid
output layer seem to produce better probability of legal moves