AlphaZero Approach

Question

AlphaZero Approach

Closed this issue 7 years ago · 2 comments

Hi,

Great work with your repository, impressive stuff, Just interested to see when you are running the software, with self-play and optimise at the same time, how many self-play games do you aim to complete between the optimiser releasing a new model? I wonder as I would have thought if not enough games are completed the model would over-fit?

Thanks jack

Answer 1 · 2018-02-11T01:37:27.000Z

Hi @JackThomson2

how many self-play games do you aim to complete between the optimiser releasing a new model?

I think that "training/self-play ratio" mentioned in #38 is important.
Ideally almost 1 is best.

In my current environment,

self-play
- about 250 self-games per hour
- about 400 positions per game
- so, generate 100k positions per hour
optimize
- 200 step per 215 seconds(no wait)
  - 200 step per 430 seconds(including wait)
- 256 positions per step
- 3600/430 * 200 * 256 ≒ 428k positions per hour

so the ratio is 4.28.

I wonder as I would have thought if not enough games are completed the model would over-fit?

I think so too.
Actually when the ratio was high(30~), training loss tended to be low.
However, I felt that the model did not get better than a certain extent.

Answer 2 · 2018-02-11T21:53:07.000Z

Thanks for this detailed reply, very insightful. In my project, I see much better results the AlphaGo approach not sure what I am doing wrong. Keep up the great work with this though, look forward to seeing your progress.