[BUG] get_dataset for flan2021_submix taking hours, while flan_zsnoopt takes 5m
TheExGenesis opened this issue · 1 comments
TheExGenesis commented
Not much to add except to say that flan_zsnoopt is gotten quite quickly, whereas flan2021_submix is going on 3h and and 22GB RAM with no sign of stopping
shayne-longpre commented
@TheExGenesis I suspect this is because the few-shot submixtures are much longer to generate.
@SirNeural is one person I know who has generated all the tasks in this pipeline. Maybe they have an estimate of the times? (I have only generated smaller versions with this external code version -- with a lot of the datasets commented out. So I don't have end-to-end wall clock.)