Move to ocrd-segment-repair?
Opened this issue ยท 8 comments
Yes, I think so too. It was unclear what exactly ocrd-segment-repair would do to my files other than my hypothetically added re-ordering operation. If ocrd-segment-repair is going down the "let the user choose a single operation" road, I'm happy to add this as one of those single operations.
To explain: I needed this to fix problems with some hundred ground truth files. As I wanted to be careful with my ground truth files I wanted to exactly fix this problem, nothing more. Therefore I wrote a separate script and did not add the operation to ocrd-segment-repair.
Yes, there's definitely going to be fine grained control of what checks and repair heuristics to use for ocrd-segment-repair. Let's delay this until we have baked ocrd-segment-evaluate (PRImA tools re-implementation) and found ourselves some useful module + data structures.
Agreed.
Shall we include this in ocrd_all or wait until you've decided whether/how to integrate with ocrd_segment?
Shall we include this in ocrd_all or wait until you've decided whether/how to integrate with ocrd_segment?
I'd say now is as good a time as ever for ocrd_all. (We want to give users the best possible processing options.)
Since this is very OCR-D specific stuff, I would actually prefer this moved to ocrd-segment-repair
at some point.
Since this is very OCR-D specific stuff, I would actually prefer this moved to
ocrd-segment-repair
at some point.
Sure, but see above โ nothing has changed from ocrd_segment's side so far. As soon as we have a good library structure there and self-explaining and orthogonal repair
processors/parameters, I'll address having ocrd-repair-inconsistencies
flow into it. Segment re-ordering is also connected to layout evaluation (projected in ocrd-segment-evaluate
) and to validation auto-repair hooks (as currently planned for coordinates) or auto-repair instrumentation (also projected for coordinates), so we first have to shake everything else together.
As I've closed #8 (Find a better name) in favor of merging it into some other tool: I suggest a very specific operation name of reorder-segments-to-match-parent-text
in the future.