Hopding/pdf-lib

Reading text contents of page

matthopson opened this issue ยท 3 comments

Hi, thanks for working on this project. It has met my needs beautifully with one exception (and it's probably a lack of understanding on my part).

I couldn't find a very intuitive way to get the contents of a page and verify its text content.

While generating a new page and inserting it into a document was very straight-forward, I'd like to also test this functionality, including that the expected contents end up on the page (it's dynamically generated). So when writing a test, I'd like to create a page, insert several lines of text, and then bring that page back in to verify that the expected lines of text exist on that page.

Am I overlooking something obvious, or are we lacking this functionality in a straight-forward way?

Thanks!

Hello @matthopson. pdf-lib is primarily focused on creating and editing PDFs right now. It does not currently have functionality to extract text content from them. Though, this is functionality I've considered adding at some point in the future.

For your use case, I'd suggest using pdf.js to extract text from the documents you create/modify with pdf-lib. pdf.js is a library specifically designed to extract text, images, etc... from PDFs for rendering. here's an example of using it in Node.

Let me know if you have any further questions!

Thanks for the response. I had considered this, but was hoping to not have to use two separate PDF libraries to do this, but it sounds like that's my best bet for the time being.

Thanks!

Hi, thank you for working on this cool library, what a team.

I would like to find out if its possible to use pdf-lib to get specic text from a pdf file using coordinates, as in were the specific text is on the page?

I'm working on a simple feature in a react ocr(optical character recorgnition using tesseract) app, with node js and espress as the server, my goal is for a user to simply upload a scanned pdf and a specific number is extracted from the document.

looking forward to your cool response

kind regards