browsermt/bergamot-translator

Relax continuity constraints on Annotation

jerinphilip opened this issue · 7 comments

Related: #355 (comment), #298

I have proposed jelmervdl/translatelocally-web-ext#5 at the experimental extension, a next feature in wishlist would be an explanation like the one below. A little far-fetched, but someday I'd like to see the visualization usually depicting attention as an explanation of translation via the extension.

image

(Screenshot taken from https://distill.pub/2016/augmented-rnns/, so we already have JS available under a permissive license, hopefully).

#298 indicates that we are editing annotation to get HTML in, but the subword tokens now include tag information. This is not ideal when we want to build things like the above. A solution is to relax the continuity constraints imposed to connect strongly to SentencePiece to just a constraint of monotonous byte ranges.

We may look at planting methods on Annotation to insert markup in between rather than doing it externally, keeping the whole data structure consistent. This would also make it simple for other markups when we get to building those.

Opening this issue to discuss.

kpu commented

"Attention is not not Explanation" https://aclanthology.org/D19-1002.pdf
"Attention is not Explanation" https://aclanthology.org/N19-1357.pdf

One difficulty I noticed is that HTML is not just text with tags added in between. Some characters, like & and < need to be replaced with &amp; and &lt;.

Good enough for HTML replacement, good enough for the visualization. Attention is all we need 🤗. Besides, we can build UI etc with the existing ByteRange derived Annotation and replace attention with whatever future mechanism becomes "explanation" in a similar setting.

Some characters, like & and < need to be replaced with &amp; and &lt;.

We don't need to relax the continuity constraints for this, but such op support via the Annotation class itself could be useful for a wider range of applications. May I ask the points where these edits happen to help study abstracting ops on Annotation that HTML is currently doing that can be pushed down and be reused across other markups as well?

No hurries though, we can slowly incubate this idea.

kpu commented

The alignments come from guided alignment trained from fastalign. Not from attention. The alignments are what drives HTML alignment.

image

NB: Continuity constraints are not relaxed, I just got the screenshot thing shown in first comment working. Looks pretty. There are some resizing ugliness.

image

NB: Continuity constraints are not relaxed, I just got the screenshot thing shown in first comment working. Looks pretty. There are some resizing ugliness.

Will you share the code of your improved version?

Will you share the code of your improved version?

jerinphilip#88 (This is early experimental code, will take a while to merge to main).