mlcommons/policies

Minigo real games played for each model should be logged and consider a tunnable hyperparameter

delock opened this issue · 1 comments

Among minigo hyperparameters, min_games_per_iteration (=8192) only restrict the minimum number of games should be played before a new training iteration can sample from.
https://github.com/mlperf/training/blob/master/reinforcement/tensorflow/minigo/ml_perf/flags/19/train_loop.flags#L9

However the reference implementation allows continuation of selfplay on the same model that creates more games until training iteration is finished. So the real number of games played for each model is an open HP [8192, +inf), which would affect the varity of samples.

Thus the real number of games played should be printed in the log to make reproduce a result possible, and should be considered a tunnable HP that fits HP stealing rule. The output in the logging could be something like:
"X games played for model 18"
"Y games played for model 19"
"Z games played for model 20"
...

We discussed this for Minigo and it is present in the HP table now in the rules;

minigo | sgd | actual_selfplay_games_per_generation | integer >= 8192 (min_selfplay_games_per_generation) | "NOT A HYPERPARAMETER, CANNOT BE 'BORROWED' during review" Implicit (LOG ONLY) - total number of games played per epoch; many parameters can impact this, varies per iteration

For context, this was labeled not an HP because actual characteristics of the system impact this (such as number of accelerators) and those characteristics cannot be stolen. We are happy to revisit this in future meetings and discussion.