Ensemble Task Implementation
sdsawtelle opened this issue · 2 comments
@HobbitLong Thank you very much for making the effort to clean and post your code for these benchmarks! I'm sure that you don't have time to post code for the ensemble distillation task, but I am going to try reproducing that benchmark so perhaps if there are any tricks or different hyperparameters settings that you can remember for that particular task off the top of your head then we can document them in this issue.
For Figure 4 in the paper, I'm wondering exactly how a single point is generated in those plots. For example, for the point that is ResNet distillation from four teachers, is that an average over multiple trials? And if so, for each trial are four new teachers trained from scratch for that trial? Or was there a pool of e.g. 8 teachers and each 4-teacher trial randomly selects four from among those 8, each 6-teacher trial randomly selects 6 from among those 8 etc?
Hi, were you able to reproduce the ensemble distillation task?