morungos/node-word-extractor

Add way to iterate, fetch, and count pages

a1icja opened this issue · 1 comments

a1icja commented

Hello! I was wondering if it would be possible to add some paging functionality. This issue could serve as three related requests:

  1. A way to iterate through pages
  2. A method to get the text for a specific page
  3. A method to get the total page count for the document

I know the library currently replaces page breaks with new lines, but it would be great to be able to break them up.

@braxton Ouch! That might be hard, or even impossible. In theory, Word may/will dynamically change the pagination depending on what printer you happen to be using. The page isn't normally exposed through the document data models. The .docx files hide that kind of information even more horribly.

What we can count and possibly expose would be the forced page breaks. If that is useful, happy to look into that.