Are there any differences between finetuning process on null input and label and directly calculating distribution of labels to obtain probabilities?

Question

Are there any differences between finetuning process on null input and label and directly calculating distribution of labels to obtain probabilities?

Opened this issue 6 months ago · 0 comments

Hi, I've read your paper and I think it's an insightful and well experimented. I have a few questions that may be unprofessional since I'm not familiar with NLP area.

In classification tasks, training a model and applying the softmax function produces probabilities for the current data. After training on null input and applying null input again to obtain Hv(Y), can we calculate the output of null input only once since the same null input will be repeated during the process? Also, I’m wondering if there are any differences between calculating the probabilities of each label directly from the dataset.

Secondly, is it possible to apply this method to image classification tasks? I’m trying to use full black/full white/random noise as null input here to replace empty sentences. It seems to work since some of the hardest samples are scored low, but with the first question, I’m not sure whether the results are meaningless or not. The classification dataset is CIFAR10, and the AlexNet model was trained from scratch. For the empty model, I trained for 3 epochs, and for the normal model, I trained for 6 epochs.

Thanks for reading this issue and your excellent research work. Here's one of the hardest picture labeled "cat" by the way.