About the caculating of the attribute accuracy
Closed this issue · 14 comments
Sorry to open another issuse to bother you. But i am not very clear about how you get the accuracy of the pretrained classifier on test dataset.
I think there are two ways to calculate it:
- Calculate the 13 attributes' accuracy, then take the average of the 13 values.
- Treat each attribute's positive side and negative side separately, and then the average of the 26 values.
The drawback of the first method is it will be influenced by the unbalanced attr frequency.
So i want to ask which one you use to calculate it.
Thanks in advance!
The first one, you can refer to att_classification/test.py.
When calculating the average score, we usually set a weight relevant to its data amount for a dataset (e.g., data amount / total data amount), in this way, the first and the second are the same.
Thx for your quick reply.
So it's a weighted average. But i am still a little confused.
For the first method, if you don't treat the positive & negative side separately, how do you calculate the needed weight?
Let's say we have 1000 sample to test. 100 of them has bald attribute.
Maybe you mean we should calculate the Bald accuracy on the 100 samples and calculate the Not Bald accuracy on the rest 900 samples, then calculate the Bald attribute accuracy with weighted_bald_attr_acc = 0.1 * acc_bald + 0.9 * acc_not_bald
?
For the first method: There is no weight (or 1/N, where N is the number of attributes used) since the data amount is identical for different attributes.
For the second method: Yes.
Since you said you used the first one, so actually when you calculating the accuracy of the pretrained classifier on the test datatset, you didn't take specical care the unbalaneced attribute (for example, the bald attribute is very unbalanced)?
My last question is about the calculating of the attribute generation accuracy.
Still there are at least two ways to do it:
- Choose attribute vector from the real data, and then use the generator to translate the images. And then calculate the attribute accuracy for each attribute (doesn't separate the positive side & negative side).
- For each attribute, sample n test images, and then use the generator to translate the images to the positive side and to the negative side separately (so they have same number). And then use the classifier the calcuate the accuracy for each attribute.
-
Yes. When evaluating the attribute classifier, I didn't take special care of the unbalanced attributes.
-
The second way is somewhat improper.
For example, if an image is with an old person,- changing it to old, i.e., reconstruction
STGAN (with difference vector) should be better than all previous methods since they use target attribute vector, which may cause inaccurate reconstruction. - changing it towards older
If the image quality is not too bad, the accuracy will approximate 100% for all methods.
- changing it to old, i.e., reconstruction
Selecting N positive examples and N negative examples for each attribute may be better, but N is limited by the number of rare attributes.
Thanks for your reply!
So you are using the first method? Thanks to make it clear to me!
For the second method, i guess it's okay to have some samples that already meet the editing requirement, as long as we use the same n samples for all models.
Yes, I'm using the first method. Specifically, taking bald as an example, for each image, if it is bald
, the target will be not bald
; if it is not bald
, then the target will be bald
. The accuracy is # successful modification
/ # total images
Very munch thanks, i am now very clear to it !
Sorry to bother you again, did you generate the test samples for each attribute separately or use the same generated samples dataset to calculating the accuracies for all the attributes?
I use all images in the official test set. Please refer to att_classification.
Thanks for your reply! I have checked this script. It looks like you only generated the test samples once, and use it to calculate accuracies for all the attributes. Have you tries to generate the test samples for each attributes separately and then calculate the accuracy on the corresponding attribute?
Sorry for the previous misleading reply.
I mean, for each attribute, generate the modified images via test.py, and calculate the accuracy by att_classification/test.py
Got it. Thanks!