pridiltal/staplr

get_fields leads to NAs and empty strings

Closed this issue · 6 comments

I would like to extract data from pdfs similar to testForm.pdf
With testForm.pdf everything works fine.
When I try it with other pdfs (I tested two types) it leads to NAs in checkboxes and empty strings in text fields.
This is my R version:

platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status
major 3
minor 5.2
year 2018
month 12
day 20
svn rev 75870
language R
version.string R version 3.5.2 (2018-12-20)
nickname Eggshell Igloo

I just installes the github-version of staplr, it does not work.
Do you have any ideas?
Thank you!

oganm commented

Would you be able to share the pdf? Not being able to replicate the issue makes it difficult to solve it on my part.

If you are interested I could send it to you in a private message, I don't want to share it here. I wrote my own function now (based on the beginning of yours) that gives me a data.frame as a result. For me the problem is solved.
Great package by the way. It inspired me to write a function that set passwords to pdfs.

oganm commented

Sure. mail me at ogan.mancarci@gmail.com and will look into the file. Hopefully can create a test file with similar properties. If you have already identified the problem that would be helpful too :)

Do you set passwords using pdtfk as well? We would welcome the contribution here if you think it's generalizable but no pressure. I did consider working on encryption decryption capabilities of pdftk a while ago but never got around to it.

I wrote an E-Mail to you :)

oganm commented

In this file, dump_data_fields returns two FieldValues, seemingly arbitrarily. This is not represented in the FDF file in any way. Need to understand why this happens before a fix. Questions are:

  • can these two values filled independently?
  • If yes, how does this effect the visible value and how can I identify this effect?
oganm commented

I will close this for now since the latest merge has a fix for this particular file and hopefully any similar files