fcfort/betterment-csv-chrome

Make PDF parser emit text element array order by position of elements on page

fcfort opened this issue · 2 comments

Make PDF parser emit text element array order by position of elements on page

The current PDF parser has made it nearly impossible to adapt to new PDF formats given that what it outputs is not directly tied to how the text elements of the PDF are shown on the screen.

Instead we will gather all text elements by their x and y positons, sort them per page and emit as a 2-d array of text elements, one line per unique y position per page.

This proved to be untenable since we ended up in situations where it was impossible to tell which lines were txn descriptions since some data came after the txn quantities & amounts.