apache/datafusion-ballista

Need clean up intermediate data in Ballista

Ted-Jiang opened this issue · 1 comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We need to check whether the states saved in the sled is consumed by UI or not.
if not consumed by UI, we can clean the job/task data when the SQL is finished.

If they are consumed by UI, we can choose either LRU based policy like Spark or time based eviction policy.

Regarding shuffle files, we also need to implement a way to clean them. This is a little bit complex because we need to clean up the files on all the hosts. We might need to add new RPCs .

related to #7