Problem with padding - RoBERTa for sequence classification

Question

Problem with padding - RoBERTa for sequence classification

ClonedOne opened this issue 2 years ago · 3 comments

Hello, I am trying to use this tool to visualize the attention heads of a RoBERTa for sequence classification model.

The inputs to the model are padded, and I am seeing this strange behavior where I get both errors:

ValueError: Attention has 512 positions, while number of tokens is 12 for tokens:

if I pass only the non-padding tokens to model_view and

ValueError: Attention has 12 positions, while number of tokens is 512 for tokens:

if I pass the full padded tokens.

Screenshot:

As you can see in the screenshot the attention object I am passing is exactly the same.

Do you know what could be causing the issue?

Thanks

Answer 1 · 2022-05-18T03:59:15.000Z

Hi @ClonedOne, thanks for reporting. It looks like you are using a batch size of 16 here, though model_view is expecting a batch size of one. This then causes the squeeze function not to work properly and gives erroneous errors. In the next version, I will fix this so it gives a sensible error in the case when batch size is not 1. Let me know if you have any questions. Thanks!

Answer 2 · 2022-05-18T04:03:22.000Z

And this is probably obvious, as I think you were using the padded version for testing purposes, but the padded version probably won't render properly due to its size.

Answer 3 · 2022-06-02T16:06:14.000Z

Thank you @jessevig, I was confused by the seemingly contrasting error messages and didn't notice that.