AlphaZero Approach
Closed this issue · 2 comments
Hi,
Great work with your repository, impressive stuff, Just interested to see when you are running the software, with self-play and optimise at the same time, how many self-play games do you aim to complete between the optimiser releasing a new model? I wonder as I would have thought if not enough games are completed the model would over-fit?
Thanks jack
how many self-play games do you aim to complete between the optimiser releasing a new model?
I think that "training/self-play ratio" mentioned in #38 is important.
Ideally almost 1 is best.
In my current environment,
- self-play
- about 250 self-games per hour
- about 400 positions per game
- so, generate 100k positions per hour
- optimize
- 200 step per 215 seconds(no wait)
- 200 step per 430 seconds(including wait)
- 256 positions per step
- 3600/430 * 200 * 256 ≒ 428k positions per hour
- 200 step per 215 seconds(no wait)
so the ratio is 4.28.
I wonder as I would have thought if not enough games are completed the model would over-fit?
I think so too.
Actually when the ratio was high(30~), training loss tended to be low.
However, I felt that the model did not get better than a certain extent.
Thanks for this detailed reply, very insightful. In my project, I see much better results the AlphaGo approach not sure what I am doing wrong. Keep up the great work with this though, look forward to seeing your progress.