blalop/bbva2pandas

Update to pdftotext 2.2.2 might have broken the export

Closed this issue · 4 comments

Just noticed that the bump of pdftotext to 2.2.2 seems to have led to a different string export. Probably this change https://github.com/jalan/pdftotext/blob/master/CHANGES.md#220---2021-08-15
We might want to think to have a fake bbva export as pdf to have an integration test here. :)

I will have a look later this week.

Hi @neugartf !

I've not found this issue, might take a closer look. If I understood correctly, all we have to do is set the physical flag when invoking pdftotext, in order to keep the old layout, right?

I do agree the lib is lacking in tests and in integration tests in particular :(

Yeah, that should work! I will open a PR with a integration test :) Sorry for the wait!

Finally found this issue after upgrading to Ubuntu 22. I'm yet working on it, but physical flag seems to don't fix it. I'll be updating.

I was wrong, the physical flag made the fix. Fixed in release 1.1.1. Thanks a lot for your help @neugartf!