
Problem with padding - RoBERTa for sequence classification

ClonedOne opened this issue · 3 comments

Hello, I am trying to use this tool to visualize the attention heads of a RoBERTa for sequence classification model.

The inputs to the model are padded, and I am seeing this strange behavior where I get both errors:

ValueError: Attention has 512 positions, while number of tokens is 12 for tokens: 

if I pass only the non-padding tokens to model_view and

ValueError: Attention has 12 positions, while number of tokens is 512 for tokens: 

if I pass the full padded tokens.


As you can see in the screenshot the attention object I am passing is exactly the same.

Do you know what could be causing the issue?


Hi @ClonedOne, thanks for reporting. It looks like you are using a batch size of 16 here, though model_view is expecting a batch size of one. This then causes the squeeze function not to work properly and gives erroneous errors. In the next version, I will fix this so it gives a sensible error in the case when batch size is not 1. Let me know if you have any questions. Thanks!

And this is probably obvious, as I think you were using the padded version for testing purposes, but the padded version probably won't render properly due to its size.

Thank you @jessevig, I was confused by the seemingly contrasting error messages and didn't notice that.