yuval-alaluf/stylegan3-editing

Questions about generate_latents_and_attribute_scores.py

NoctisZ opened this issue · 4 comments

Great work and thank you so much for sharing the code!
I got a couple of questions regarding the generate_latents_and_attribute_scores.py which I'm using to generate training data for new boundaries:

  • What is the purpose of using save_interval in this function? Is there any reason that we don't want to save all generated data into a single folder (and a single scores.npy, ws.npy, etc.)?
  • I noticed the following description of save_interval in README (for generate_latents_and_attribute_scores.py as well) -- "An npy file will be saved every save_interval samples". However, in the code Line 90, the if-statement if seed_idx % save_interval == 0 and seed > 0 will only save a npy file at every (save_interval + 1) step since seed_idx starts from 0. For example, if we set n_images to 8 and save_interval to 4, we want to have two npy files each containing 4 faces, but we'll end up getting only one npy file containing 5 faces. I think maybe this is a bug and need to fix it to something like if (seed_idx + 1) % save_interval == 0 and seed > 0?

Would you kindly let me know if I'm missing something here? I would really appreciate it!

From the README:

An npy file will be saved every save_interval samples.

In generate_latents_and_attribute_scores.py:

# How often to save sample latents/scores to `npy` files
save_interval: int = 10000

preds, ages, poses, ws = [], [], [], []
saving_batch_id = 0
for seed_idx, seed in enumerate(range(n_images)):

if seed_idx % save_interval == 0 and seed > 0:
save_latents_and_scores(preds, ws, ages, poses, saving_batch_id, output_path)
saving_batch_id = saving_batch_id + 1
preds, ages, poses, ws = [], [], [], []
print(f'Generated {save_interval} images!')

From your post:

  • will only save a npy file at every (save_interval + 1) step since seed_idx starts from 0.

No, it will save every save_interval, though it is true that it will always save the first seed, indexed by seed_idx=0, unless the check seed>0 helps with.

I see your point, so basically with if seed_idx % save_interval == 0 and seed > 0, the first seed, indexed by seed_idx=0, will always be saved, and then starting from the second seed, we'll save them every save_interval step.

I tried with n_images set to 5 and save_interval set to 2, and found that data in id_0 folder contains 3 faces since the first seed is saved together with the next two, and the data in id_1 folder contains 2 faces. However, if n_images is set to 4 and save_interval is 2, I can only get one folder (ie. id_0) that contains 3 faces and lost data for the 4th face. I guess I just need to be careful with this hyperparameter setting then.

Hmmm... Thinking a bit more about it...

  • seed iterates over range(n_images), so 0, 1, 2, ..., n_images-1.
  • seed_idx iterates over the enumeration of these, starting at 0.

In practice, seed and seed_idx are the same here, so I have trouble understanding why there is this call to enumerate(). 🤔

As you mentioned, you get batches of images of length save_interval, except for the first batch which has an additional image corresponding to seed_idx=0. For n_images=5 and save_interval=2, you get a first batch of 3 images, and a second batch of 2 images. For n_images=4 and save_interval=2, you get a first batch of 3 images, and the second batch never reaches 2 images, so it is not saved.

In the scenario that this is actually a bug, then maybe one could replace this line:

for seed_idx, seed in enumerate(range(n_images)):

with:

for seed_idx, seed in enumerate(range(n_images), start=1):

This way, the following line:

if seed_idx % save_interval == 0 and seed > 0:

could be simplified to:

if seed_idx % save_interval == 0:

You would get batches of images of length save_interval. For n_images=5 and save_interval=2, you would get a first batch of 2 images, and a second batch of 2 images, while the third batch would never be completed. For n_images=4 and save_interval=2, you would get a first batch of 2 images, and a second batch of 2 images.

The save_interval parameter would:

  • still be a bit tricky because you can lose some of the images if the last batch is never completed,
  • but more intuitive. And batches would all be of the requested size. So I believe it would be better.

Yes totally agree. Thanks for the response!