lebebr01/pdfsearch

heading_search is reporting the incorrect line_num

Opened this issue · 2 comments

I have tested this with multiple PDFs that were loaded into R as character vectors. In particular there is a PDF (character vector) that has a "CONTENTS" page on page 6. When previewing the text using head(text) the 6th element (page of the text) is the contents page. When searching for it using

heading_search('text',"CONTENTS")  

returns
keyword page_num
CONTENTS 7
I tried using the function directly with the source PDF and the same result occurs.

Thanks for submitting this, this is a holdover from some modification to the code previously. I'll fix this in the dev version soon.

@lebebr01 great thanks for fixing it. To add some context it seems the issue is with a blank page.
The blank page shows as "" when looking at the document using head(document) in R. In the document with the issue the first 3 pages have text, the 4th is blank, the next 2 have text (the 6th page is the table of contents). Using heading_search I find the other pages correctly until the blank page. Even removing the blank page does not fix the error. If I remove pages up to and including the blank page it works correctly. For some reason I think the blank page is being counted twice or alters the page numbering.