Token Classification
zooeygirl opened this issue · 4 comments
Hi there,
Thank you very much for putting this together. It is very cool!
I would like to adapt it to a Bert model that has a linear classifier on top (a class is predicted for each token in the input). I was wondering if you have any tips for the easiest way to do this?
Thanks a lot.
Nevermind, I found a solution. But I have a related question: Are you planning to extend this to other language models (GPT-2 for example)?
Hi again,
So I am trying to adapt your work to GPT2, which is quite similar to BERT save a few differences. For example, where BERT uses a linear layer to project the query, key and value in its attention unit, GPT2 uses Conv1D layer. What is the best way to treat relprop for Conv1D?
Thanks a lot.
Hi @zooeygirl thanks for your interest in our work!
currently, we are not planning on expanding to GPT-2.
In order to avoid implementing LRP, you can consider using our second paper (from ICCV21') where we eliminated the use of LRP.
If you're still interested in using LRP, here's a nice resource with LRP implementations.
Feel free to ask for clarifications if needed.
Best,
Hila.
Thank you for the response and for the resources, Hila. I will check them out. :-)