shindavid/AlphaZeroArcade

C4 experiments using perfect oracle

Closed this issue · 2 comments

The existence of a perfect oracle in c4 allows us to do some useful experiments. For instance:

  1. We can take the self-play data, relabel the targets using the oracle, train a net off that, and rigorously measure the gap between that net and the net trained off the original targets. Tracking the progression of this gap over the course of training can be insightful.
  2. We can have an alphazero oracle-mode where all self-play data is relabeled using the oracle, and examine progression of the alphazero process in oracle-mode.
  3. We can measure move accuracy of the MCTS agent, both overall and also based on move number of the game. We can do this based on MCTS parameters like number of iterations and number of threads. We can track this over the course of training. We expect late-game accuracy to converge to 100% early in training.

At the minimum such experiments should help us catch any obvious bugs. Hopefully they give us more insight that can help us in other ways.

The grade_c4_models.py script, together with viz_c4_progress.py, accomplishes task 3. Tasks 1 and 2 could still potentially be insightful.

The c4 experimentation loop is mature at this point, so I don't think we need tasks 1 and 2.