feedly/transfer-nlp

caching objects in experiment runner

kireet opened this issue · 1 comments

some readonly objects can take awhile to load in experiments (embeddings, datasets, etc). The current ExperimentRunner always recreates the entire experiment. It would be nice if we could keep some objects in memory...

Proposal

add a cached property in run_all

    def run_all(experiment: Union[str, Path, Dict],
                experiment_cache: Union[str, Path, Dict],
                experiment_config: Union[str, Path],
                report_dir: Union[str, Path],
                trainer_config_name: str = 'trainer',
                reporter_config_name: str = 'reporter',
                **env_vars) -> None:

The cache is just another experiment json. it would be loaded only once at the very beginning only using the env_vars. any resulting objects would then be added to env_vars when running each each experiment. objects can optionally implement a Resettable class that has a reset method that would be called once before each experiment.

incorrect usage of this feature could lead to non-reproducibility issues, but through docs we could make it clear this should only be for read-only objects. i think it would be worth doing...

the cache should be optional, i guess it should go towards the end of the parameter list with a default value of None