Some questions.
IdiotSandwichTheThird opened this issue · 2 comments
Looking at this project a few questions come to my mind that are unanswered in the readme.
- Does this allow to use >75 tokens as text input, or is the text/tags truncated after 75?
- How does limit_train_batches work? Are the images chosen at random? Do I have to change the value if the dataset is >100 images?
- Are input images resized automatically for the buckets, or is some pre-processing necessary? (Looking at the test datasets for example I see all images appear to be 1:1 aspect ratio, is this required?)
75 tokens as text input
This feature has been implemented in recent commits. Specify max_length
in config.dataset to make your own or follow the default 225 tokens limit.
How does limit_train_batches work,
trainer.limit_train_batches
will directly pass to pl.Trainer and limit the batches used in a single epoch...(maybe we should set this value in default to 1.0? As for the images, every epoch arb will shuffle and generate a new list of image batches.
images resized automatically for the buckets...
Input images will be resized and cropped(a little) by arb, if needed u can enable arb.debug
and dataset.debug_arb
to view those cropped images. well it'is handy but do some augmentation in advance will make things better obviously
Nice! Thank you for your work. I saw some issues sometime with it running out of vram when using >100 limit_train_batches, though after switching to xformers it doesn't happen anymore.