Allow finer fidelity logging
bhancock8 opened this issue · 4 comments
Do not restrict reporting performance on the dev set (or calculating metrics) to once an epoch; create an intelligent logger that can be time-based (evaluate every 10s), num_examples based (evaluate every 1k examples), batch based (every 10 batches), or epoch based. Should be fairly lightweight but much nicer to work with.
See train loop of ParlAI for inspiration: https://github.com/bhancock8/ParlAI/blob/a52e3ddcb711b957ff6ad67e86edea492f280f80/parlai/scripts/train_model.py#L327
We could have many of the same settings, but probably compartmentalize it a bit better into a single class.
At the same time, trim up/modularize the train loop. It's gotten too long.
Ideal abstraction: the user specifies some "frequency" (in seconds, examples, batches, or epochs) at which to evaluate on dev set. Some arbitrary set of metrics (some built-in, some user-provided) are evaluated at that time (on the whole dev set or sum max number of examples). If checkpointing is on, the user specified which metric to use and whether to min or max it. If a new best is found, execute checkpointer to save model.