Using run distillation lead to a high .cache use
Gusreis7 opened this issue · 0 comments
Gusreis7 commented
When running training in a new language, the execution distillation processes use a lot of cache, in my case it is generating 70GB of cache data, which makes it difficult to use this training effectively, has anyone experienced this same problem? or know how to solve