About the caculating of the attribute accuracy

Question

About the caculating of the attribute accuracy

Closed this issue 3 years ago · 14 comments

Sorry to open another issuse to bother you. But i am not very clear about how you get the accuracy of the pretrained classifier on test dataset.

I think there are two ways to calculate it:

Calculate the 13 attributes' accuracy, then take the average of the 13 values.
Treat each attribute's positive side and negative side separately, and then the average of the 26 values.

The drawback of the first method is it will be influenced by the unbalanced attr frequency.

So i want to ask which one you use to calculate it.

Thanks in advance!

Answer 1 · 2021-10-05T02:42:53.000Z

The first one, you can refer to att_classification/test.py.
When calculating the average score, we usually set a weight relevant to its data amount for a dataset (e.g., data amount / total data amount), in this way, the first and the second are the same.

Answer 2 · 2021-10-05T02:59:19.000Z

Thx for your quick reply.
So it's a weighted average. But i am still a little confused.

For the first method, if you don't treat the positive & negative side separately, how do you calculate the needed weight?

Let's say we have 1000 sample to test. 100 of them has bald attribute.
Maybe you mean we should calculate the Bald accuracy on the 100 samples and calculate the Not Bald accuracy on the rest 900 samples, then calculate the Bald attribute accuracy with weighted_bald_attr_acc = 0.1 * acc_bald + 0.9 * acc_not_bald?

Answer 3 · 2021-10-05T03:03:23.000Z

For the first method: There is no weight (or 1/N, where N is the number of attributes used) since the data amount is identical for different attributes.
For the second method: Yes.

Answer 4 · 2021-10-05T03:23:38.000Z

Since you said you used the first one, so actually when you calculating the accuracy of the pretrained classifier on the test datatset, you didn't take specical care the unbalaneced attribute (for example, the bald attribute is very unbalanced)?

My last question is about the calculating of the attribute generation accuracy.

Still there are at least two ways to do it:

Choose attribute vector from the real data, and then use the generator to translate the images. And then calculate the attribute accuracy for each attribute (doesn't separate the positive side & negative side).
For each attribute, sample n test images, and then use the generator to translate the images to the positive side and to the negative side separately (so they have same number). And then use the classifier the calcuate the accuracy for each attribute.

Answer 5 · 2021-10-05T03:45:02.000Z

Yes. When evaluating the attribute classifier, I didn't take special care of the unbalanced attributes.
The second way is somewhat improper.
For example, if an image is with an old person,
1. changing it to old, i.e., reconstruction
  STGAN (with difference vector) should be better than all previous methods since they use target attribute vector, which may cause inaccurate reconstruction.
2. changing it towards older
  If the image quality is not too bad, the accuracy will approximate 100% for all methods.

Answer 6 · 2021-10-05T03:49:20.000Z

Selecting N positive examples and N negative examples for each attribute may be better, but N is limited by the number of rare attributes.

Answer 7 · 2021-10-05T03:56:10.000Z

Thanks for your reply!
So you are using the first method? Thanks to make it clear to me!

For the second method, i guess it's okay to have some samples that already meet the editing requirement, as long as we use the same n samples for all models.

Answer 8 · 2021-10-05T04:00:02.000Z

Yes, I'm using the first method. Specifically, taking bald as an example, for each image, if it is bald, the target will be not bald; if it is not bald, then the target will be bald. The accuracy is # successful modification / # total images

Answer 9 · 2021-10-05T04:08:53.000Z

Very munch thanks, i am now very clear to it !

Answer 10 · 2021-10-09T09:12:49.000Z

Sorry to bother you again, did you generate the test samples for each attribute separately or use the same generated samples dataset to calculating the accuracies for all the attributes?

Answer 11 · 2021-10-09T10:01:54.000Z

I use all images in the official test set. Please refer to att_classification.

Answer 12 · 2021-10-09T10:09:26.000Z

Thanks for your reply! I have checked this script. It looks like you only generated the test samples once, and use it to calculate accuracies for all the attributes. Have you tries to generate the test samples for each attributes separately and then calculate the accuracy on the corresponding attribute?

Answer 13 · 2021-10-09T10:13:10.000Z

Sorry for the previous misleading reply.
I mean, for each attribute, generate the modified images via test.py, and calculate the accuracy by att_classification/test.py

Answer 14 · 2021-10-09T11:05:22.000Z

Got it. Thanks!