caching objects in experiment runner
kireet opened this issue · 1 comments
some readonly objects can take awhile to load in experiments (embeddings, datasets, etc). The current ExperimentRunner
always recreates the entire experiment. It would be nice if we could keep some objects in memory...
Proposal
add a cached property in run_all
def run_all(experiment: Union[str, Path, Dict],
experiment_cache: Union[str, Path, Dict],
experiment_config: Union[str, Path],
report_dir: Union[str, Path],
trainer_config_name: str = 'trainer',
reporter_config_name: str = 'reporter',
**env_vars) -> None:
The cache is just another experiment json. it would be loaded only once at the very beginning only using the env_vars
. any resulting objects would then be added to env_vars when running each each experiment. objects can optionally implement a Resettable
class that has a reset
method that would be called once before each experiment.
incorrect usage of this feature could lead to non-reproducibility issues, but through docs we could make it clear this should only be for read-only objects. i think it would be worth doing...
the cache should be optional, i guess it should go towards the end of the parameter list with a default value of None