[ASoC 2022] Enable data caching cross jobs to boost job performance with high memory efficiency

Question

[ASoC 2022] Enable data caching cross jobs to boost job performance with high memory efficiency

Opened this issue 2 years ago · 0 comments

What would you like to be added:

Refactor the caching API to support inter-job caching, which means the lifecycle of datasets should be independent of training jobs.
Implement a caching policy that interacts with the distributed cache runtime to retain popular datasets in memory, such that the cache efficiency is maximized.

Why is this needed:
Caching datasets in memory of the local cluster helps to accelerate the training jobs. Typically, popular and public datasets might be used by multiple jobs. Therefore, it helps improve the caching efficiency to make datasets sharable across training jobs with a well-designed caching policy.