renzhenwang/bias-adaptive-classifier

During pseudo label generation, why adding the adaptive bias when imb_ratio_l equals imb_ratio_u?

Closed this issue · 7 comments

Greetings,
This paper is really cool and it's very nice of authors to release the codes. I have one small question about the code. In Line 274, why should we add the bias res_outputs to logits for the pseudo label generation only when imb_ratio_l equals imb_ratio_u?

Hi, chengcheng, thank you for your interest in our work! Our main consideration is that: when the class distributions of labeled data and unlabeled data are the same, theoretically, the accuracy of pseudo-labels of unlabeled data is the highest by using the same classifier trained with the labeled data. So, we use the whole bias adaptive classifier (the linear classifier+the bias attractor) to estimate the pseudo-labels of unlabeled data, i.e., we add the bias res_outputs to logits for the pseudo label generation in Line 274.

Hi, chengcheng, thank you for your interest in our work! Our main consideration is that: when the class distributions of labeled data and unlabeled data are the same, theoretically, the accuracy of pseudo-labels of unlabeled data is the highest by using the same classifier trained with the labeled data. So, we use the whole bias adaptive classifier (the linear classifier+the bias attractor) to estimate the pseudo-labels of unlabeled data, i.e., we add the bias res_outputs to logits for the pseudo label generation in Line 274.

Thanks for reply. Then why didn't add the bias to improve the accuracy of pseudo labels when the class distributions are not the same? The bias only works sometimes?

Hi, chengcheng, thank you for your interest in our work! Our main consideration is that: when the class distributions of labeled data and unlabeled data are the same, theoretically, the accuracy of pseudo-labels of unlabeled data is the highest by using the same classifier trained with the labeled data. So, we use the whole bias adaptive classifier (the linear classifier+the bias attractor) to estimate the pseudo-labels of unlabeled data, i.e., we add the bias res_outputs to logits for the pseudo label generation in Line 274.

Thanks for reply. Then why didn't add the bias to improve the accuracy of pseudo labels when the class distributions are not the same? The bias only works sometimes?

This is just for the experimental settings: 1) Once we have the prior knowledge that the labeled data and unlabeled data have the same class distribution, we take the better classifer (the linear classifier+the bias attractor) to estimate the preudo-labels for such a special setting; 2) A more general case is that we can't get the class distribution of unlabeled data, and we only use the linear classifier to estimate the preudo-labels, since it has basically equcal preference to both head and tail classes.

I hope this reply will address your concerns, thanks!

I see. Thanks for your explanation~ 😄

Sorry but I have another simple question: what is new of MetaModule in comparison to the regular nn.Module? Do they make difference when conducting the regular 1-order optimization?

Sorry but I have another simple question: what is new of MetaModule in comparison to the regular nn.Module? Do they make difference when conducting the regular 1-order optimization?

There is no difference between MetaModule and nn.Module when conducting the regular 1-order optimization. However, nn.Module fails to support calculating the second-order gradients in bi-level optimization, and we thus need MetaModule. This is just one way of code implementations, but there might be other ways to solve bi-level optimization problems without introducing MetaModule.

Got it, thank you