akbar2habibullah/attention-based-retrieval
This work showcasing the emerge ability of Transformer model to extract the most "important" information inside its attention layer. Ideally, we should extract attention layer directly from inference process to get the most influential token inside a sequence.
JavaScript