Text from a scanned document.

Question

Text from a scanned document.

ericvanderlinden opened this issue 3 years ago · 2 comments

Many research institutes in humanities scan documents and books and get text via OCR. We can analyze the text, but is there a standard way to connect the scan and the text when you know the coefficient of where the word or phrase is on the image.

Answer 1 · 2021-09-22T08:01:40.000Z

This is a usage question so I'm moving it to discussions.

Answer 2 · 2021-09-22T08:04:34.000Z

Wait, sorry, I thought this was on the main spaCy repo. I'm not sure why you're asking this on this repo, since it doesn't seem to have anything to do with Streamlit, but there is not a standard way to do that in spaCy right now.

We are working on methods to pass in extra data, such as page coordinates, but we're still figuring out the right way to do this. For now you can see the FAQ.

Closing this since there's no action to be taken. If you want to discuss it more open a Discussion at the main spaCy repo, though you might want to look at the existing threads on OCR first.