Questions about generate_latents_and_attribute_scores.py
NoctisZ opened this issue · 4 comments
Great work and thank you so much for sharing the code!
I got a couple of questions regarding the generate_latents_and_attribute_scores.py
which I'm using to generate training data for new boundaries:
- What is the purpose of using
save_interval
in this function? Is there any reason that we don't want to save all generated data into a single folder (and a single scores.npy, ws.npy, etc.)? - I noticed the following description of
save_interval
in README (forgenerate_latents_and_attribute_scores.py
as well) -- "An npy file will be saved every save_interval samples". However, in the code Line 90, the if-statementif seed_idx % save_interval == 0 and seed > 0
will only save a npy file at every (save_interval
+ 1) step since seed_idx starts from 0. For example, if we setn_images
to 8 andsave_interval
to 4, we want to have two npy files each containing 4 faces, but we'll end up getting only one npy file containing 5 faces. I think maybe this is a bug and need to fix it to something likeif (seed_idx + 1) % save_interval == 0 and seed > 0
?
Would you kindly let me know if I'm missing something here? I would really appreciate it!
From the README:
An
npy
file will be saved everysave_interval
samples.
In generate_latents_and_attribute_scores.py
:
From your post:
- will only save a npy file at every (
save_interval
+ 1) step since seed_idx starts from 0.
No, it will save every save_interval
, though it is true that it will always save the first seed, indexed by seed_idx=0
, unless the check seed>0
helps with.
I see your point, so basically with if seed_idx % save_interval == 0 and seed > 0
, the first seed, indexed by seed_idx=0
, will always be saved, and then starting from the second seed, we'll save them every save_interval
step.
I tried with n_images
set to 5 and save_interval
set to 2, and found that data in id_0 folder contains 3 faces since the first seed is saved together with the next two, and the data in id_1 folder contains 2 faces. However, if n_images
is set to 4 and save_interval
is 2, I can only get one folder (ie. id_0) that contains 3 faces and lost data for the 4th face. I guess I just need to be careful with this hyperparameter setting then.
Hmmm... Thinking a bit more about it...
seed
iterates overrange(n_images)
, so 0, 1, 2, ...,n_images-1
.seed_idx
iterates over the enumeration of these, starting at 0.
In practice, seed
and seed_idx
are the same here, so I have trouble understanding why there is this call to enumerate()
. 🤔
As you mentioned, you get batches of images of length save_interval
, except for the first batch which has an additional image corresponding to seed_idx=0
. For n_images=5
and save_interval=2
, you get a first batch of 3 images, and a second batch of 2 images. For n_images=4
and save_interval=2
, you get a first batch of 3 images, and the second batch never reaches 2 images, so it is not saved.
In the scenario that this is actually a bug, then maybe one could replace this line:
with:
for seed_idx, seed in enumerate(range(n_images), start=1):
This way, the following line:
could be simplified to:
if seed_idx % save_interval == 0:
You would get batches of images of length save_interval
. For n_images=5
and save_interval=2
, you would get a first batch of 2 images, and a second batch of 2 images, while the third batch would never be completed. For n_images=4
and save_interval=2
, you would get a first batch of 2 images, and a second batch of 2 images.
The save_interval
parameter would:
- still be a bit tricky because you can lose some of the images if the last batch is never completed,
- but more intuitive. And batches would all be of the requested size. So I believe it would be better.
Yes totally agree. Thanks for the response!