C4 experiments using perfect oracle

Question

Closed this issue a year ago · 2 comments

The existence of a perfect oracle in c4 allows us to do some useful experiments. For instance:

We can take the self-play data, relabel the targets using the oracle, train a net off that, and rigorously measure the gap between that net and the net trained off the original targets. Tracking the progression of this gap over the course of training can be insightful.
We can have an alphazero oracle-mode where all self-play data is relabeled using the oracle, and examine progression of the alphazero process in oracle-mode.
We can measure move accuracy of the MCTS agent, both overall and also based on move number of the game. We can do this based on MCTS parameters like number of iterations and number of threads. We can track this over the course of training. We expect late-game accuracy to converge to 100% early in training.

At the minimum such experiments should help us catch any obvious bugs. Hopefully they give us more insight that can help us in other ways.

Answer 1 · 2023-05-11T16:02:51.000Z

The grade_c4_models.py script, together with viz_c4_progress.py, accomplishes task 3. Tasks 1 and 2 could still potentially be insightful.

Answer 2 · 2023-10-06T19:46:11.000Z

The c4 experimentation loop is mature at this point, so I don't think we need tasks 1 and 2.