feature proposal: continue model pre-train

Question

feature proposal: continue model pre-train

Closed this issue 13 days ago · 3 comments

when caption data is a strong bottleneck, i.e. i have lots of similar-category image data without captioning, it would be nice if we offer continue pre-training for model over images only.

Answer 1 · 2024-12-05T20:26:57.000Z

what are you expecting to happen?

Answer 2 · 2024-12-07T17:40:36.000Z

hope is that Flux can now see dataset previously not seen during initial training, and can generalize over it: we now only need to provide a small number of captioned supervised image data to train a more robust model.

Answer 3 · 2024-12-07T17:42:56.000Z

if you don't use captions during training for a large enough run, it will just forget how to use them. it will not generalise that way.