JosephKJ/OWOD

Clarification regarding the protocol

akshay-raj-dhamija opened this issue · 2 comments

Thanks for the nice paper. The paper talks about

"setting the classification logits of the unseen classes to a large negative value (v), thus making their contribution to softmax negligible"

while this explains the training methodology difference between knowns and unknowns on the classification head, I was wondering what is the difference for the bounding box regression head and that for RPN.
That is when training the network initially for a 15+5 class problem setting, the labels for the 5 classes are ignored by fixing the logits for the classification problem but how are the bounding box annotations for these 5 classes used both at the localization head and the RPN?

Hi Akshay,

Thank you for the nice question. Kindly allow me to explain it in detail.

'setting the classification logits of the unseen classes to a large negative value (v)' was not done to explicitly help to model the 'unknown' classes. It was just to limit the number of neurons in the final classification to the number of seen classes, without explicitly changing the architecture (as the gradient will not propagate through these nodes for which the activations are set to large negative value). This is a standard setting followed in many class-incremental works.

Such a requirement (implicitly modifying the network architecture based on the number of classes) is only for the classification head. Hence the RPN or the bounding box regression head doesnot have any such modification.

For the 15 + 5 experiment, labels for only the first 15 classes are considered. Please note that the model doesnot have access to any label (class / bounding-box) for the next 5 classes while learning the first 15 classes. Hence both classification head and regression head are learned only for these 15 classes. Neither RPN nor the regression head are learned in a class specific manner.

Please do let me know whether I was able to address the query clearly. If not, I would be happy to clarify further.

Thanks,
Joseph

Thanks for the prompt response. yes, it addresses my question.