Mikubill/naifu

Some questions.

IdiotSandwichTheThird opened this issue · 2 comments

Looking at this project a few questions come to my mind that are unanswered in the readme.

  1. Does this allow to use >75 tokens as text input, or is the text/tags truncated after 75?
  2. How does limit_train_batches work? Are the images chosen at random? Do I have to change the value if the dataset is >100 images?
  3. Are input images resized automatically for the buckets, or is some pre-processing necessary? (Looking at the test datasets for example I see all images appear to be 1:1 aspect ratio, is this required?)

75 tokens as text input

This feature has been implemented in recent commits. Specify max_length in config.dataset to make your own or follow the default 225 tokens limit.

How does limit_train_batches work,

trainer.limit_train_batches will directly pass to pl.Trainer and limit the batches used in a single epoch...(maybe we should set this value in default to 1.0? As for the images, every epoch arb will shuffle and generate a new list of image batches.

images resized automatically for the buckets...

Input images will be resized and cropped(a little) by arb, if needed u can enable arb.debug and dataset.debug_arb to view those cropped images. well it'is handy but do some augmentation in advance will make things better obviously

Nice! Thank you for your work. I saw some issues sometime with it running out of vram when using >100 limit_train_batches, though after switching to xformers it doesn't happen anymore.