ffalt/pdf.js-extract

Get Coordinates of Each word.

Opened this issue · 1 comments

Hi,
Is it possible to get coordinates of each word in the PDF. "Hello, world!" output is a chunk of words, I want to extract each word as one separate item - i.e. Hello; ,; world; ! all separate. Is it possible?

damn that is difficult to get. by adding space the program can identify the words. but without spaces it is difficult to add context of the word starting and ending to the program. you can use openAI api to do this. i feel it is difficult to do it with any library.