Using run distillation lead to a high .cache use

Question

Using run distillation lead to a high .cache use

Gusreis7 opened this issue 3 months ago · 0 comments

When running training in a new language, the execution distillation processes use a lot of cache, in my case it is generating 70GB of cache data, which makes it difficult to use this training effectively, has anyone experienced this same problem? or know how to solve