Make PDF parser emit text element array order by position of elements on page
fcfort opened this issue · 2 comments
fcfort commented
Make PDF parser emit text element array order by position of elements on page
fcfort commented
The current PDF parser has made it nearly impossible to adapt to new PDF formats given that what it outputs is not directly tied to how the text elements of the PDF are shown on the screen.
Instead we will gather all text elements by their x and y positons, sort them per page and emit as a 2-d array of text elements, one line per unique y position per page.
fcfort commented
This proved to be untenable since we ended up in situations where it was impossible to tell which lines were txn descriptions since some data came after the txn quantities & amounts.