DDMAL/Rodan

[Text_alignment / OCR] Syllables not being picked up for MS73

Opened this issue · 3 comments

UPDATE: This might be in part due to a user mistake (here we are again), so please hold!

I've been starting to run some e2e OMR workflows with MS73 folios and the text part of the process doesn't seem to be working. Normally, the original image is separated into layers, and the text layer gets sent to the Text_Alignment job, which uses OCR to roughly find the syllables and then match them with the correct text that we provide. In Neon, the syllables will look like this (this is a Salzinnes folio):

Salzie good syllables example

However, this process doesn't seem to be working for MS73. So far, @kyrieb-ekat got this (enjoy the numbers):
Kyrie text nuggets

And I got no syllables at all:
MS73 054 no syllables

Because the syllable text is directly related to how the neumes are grouped into syllables, these errors result in the syllable groupings being completely wrong, which lengthens the correction time quite a bit.

I ran an e2e OMR workflow with an Einsie folio and the syllables were perfect, so this seems to be an MS73-specific problem. Could it simply be that the training model we've been using for Salzinnes and Einsie doesn't work for MS73? In the Text_Alignment job, the training model is built directly into the job, so I don't think this is something that I can change.

ANOTHER UPDATE: This was indeed in part due to user error (you can always count on me). I mistakenly assigned the wrong layer output to the input of the Text_Alignment job, which is why my syllables came up completely empty.

However! Kyrie did not make that mistake, so her result is accurate. I tried running a couple more workflows after fixing my mistake and I'm getting something similar. There are syllables, but they are far too few and those that are there are not correct. I'm not sure at the moment what this is due to; it's possible that as our glyph classification training data improves, the syllable problem will lessen. I'll put this issue on hold for now until we know more.

I'm going to also be retracing some of the previous steps done on this, and test a few more pages of MS73. Also, to look into OCRopus and see what the text_alignment thought processes for the OCR models were.

FURTHER UPDATE: I thought that this syllable problem was due to the general OMR being kind of bad, but I was wrong. My most recent e2e OMR runs have been producing very handsome folios and, though the syllables are indeed improved, they're still pretty bad. For example, this is folio 278/138r straight outta OMR:

Screenshot 2024-11-18 at 15 39 04

The text should be:
[...] ctum afferunt in pacientia | ~Mirabilia | euouae
Vobis datum est nosse misterium regni dei ceteris autem in parabolis dixit ihecuc discipulis suis | ~Magnificat | euouae
Semen est verbum dei sator autem xpictuc omnis qui audit eum manebit in eternum | ~Benedictus | euouae

As you can see, a lot of that text is missing. The grouping of syllables is also pretty sketchy, even in staves where the neumes are quite good, like staves 2, 4, and 5.