Missing font
dcaud opened this issue · 13 comments
I have seen a lot of the following type of errors on various PDFs:
PDF error: Unknown font in field's DA string
PDF error: Missing 'Tf' operator in field's DA string
For example, this Alberta-tf-operator-error-CAV-2-FORMB.pdf file has text on the buttons on the second page (as Viewed in Mac's Preview or Adobe's Acrobat Pro DC). However, converting it to png, it loses that text and displays the missing font message in the R console.
pdftools::pdf_convert("Alberta-tf-operator-error-CAV-2-FORMB.pdf",
page=2)
This may be a PDF file that doesn't adhere to the PDF spec, but because many PDFs do not, I'd like this to work in some fashion.
Is there any way to get pdftools to render the button text in this example file? Maybe that would point to how this can be generalized to other PDFs with similar issues.
Hmm I'm not sure. I don't think the buttons contain any text, but actually a small image. If we extract the text it does not appear either:
cat(pdftools::pdf_text('Alberta-tf-operator-error-CAV-2-FORMB.pdf')[2])
But I am also not sure why the image does not appear in the output.
Oh it actually seems to work with a later version of the poppler library. Maybe I should update it again.
Which operating system do you use?
I'm using both Mac and Linux. Here's a profile from the Mac. Thanks for looking into this!
sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.1Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dyliblocale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8attached base packages:
[1] tools stats graphics grDevices utils datasets methods baseother attached packages:
[1] shinyBS_0.61 jsonlite_1.7.3 mongolite_2.4.1 ipc_0.1.3
[5] future_1.23.0 promises_1.2.0.1 googleAuthR_2.0.0 firebase_1.0.1
[9] RPostgres_1.4.3 pool_0.1.6 dplyr_1.0.8 shinyjs_2.1.0
[13] pdftools_3.0.1 shinybusy_0.2.2 shinyWidgets_0.6.4 magick_2.7.3
[17] colourpicker_1.1.1 shiny_1.7.1loaded via a namespace (and not attached):
[1] Rcpp_1.0.8 lubridate_1.8.0 txtq_0.2.4 listenv_0.8.0
[5] assertthat_0.2.1 digest_0.6.29 utf8_1.2.2 parallelly_1.30.0
[9] mime_0.12 R6_2.5.1 backports_1.4.1 httr_1.4.2
[13] pillar_1.7.0 rlang_1.0.1 curl_4.3.2 rstudioapi_0.13
[17] fontawesome_0.2.2 miniUI_0.1.1.1 jquerylib_0.1.4 blob_1.2.2
[21] qpdf_1.1 htmlwidgets_1.5.4 bit_4.0.4 jose_1.2.0
[25] compiler_4.1.2 httpuv_1.6.5 pkgconfig_2.0.3 askpass_1.1
[29] base64enc_0.1-3 globals_0.14.0 htmltools_0.5.2 openssl_1.4.6
[33] tidyselect_1.1.1 tibble_3.1.6 codetools_0.2-18 fansi_1.0.2
[37] crayon_1.4.2 withr_2.4.3 later_1.3.0 xtable_1.8-4
[41] lifecycle_1.0.1 DBI_1.1.2 magrittr_2.0.2 cli_3.1.1
[45] cachem_1.0.6 fs_1.5.2 bslib_0.3.1 filelock_1.0.2
[49] ellipsis_0.3.2 generics_0.1.2 vctrs_0.3.8 bit64_4.0.5
[53] glue_1.6.1 purrr_0.3.4 hms_1.1.1 parallel_4.1.2
[57] fastmap_1.1.0 gargle_1.2.0 base64url_1.4 memoise_2.0.1
[61] sass_0.4.0
I have released a new version pdftools 3.1.0
that includes a more recent version of libpoppler for Windows and MacOS. You can test it from here:
install.packages("pdftools", repos = "https://ropensci.r-universe.dev")
For Linux it is a bit more tricky because we use the libpoppler that is included with your linux distribution. I think the problem should be fixed at least in ubuntu 22.04 that will be released in april, because it includes poppler 22.02: https://packages.ubuntu.com/jammy/libpoppler-dev
I'm not sure about the other distros, it really depends what OS you use.
I have the same issue, but in this case, I cannot update to pdftools version 3.1.0.
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘pdftools’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/kenneth/R/x86_64-pc-linux-gnu-library/4.1/00LOCK-pdftools/00new/pdftools/libs/pdftools.so':
/home/kenneth/R/x86_64-pc-linux-gnu-library/4.1/00LOCK-pdftools/00new/pdftools/libs/pdftools.so: undefined symbol: _ZNK7poppler8text_box13has_font_infoEv
Error: loading failed
Ejecución interrumpida
ERROR: loading failed
- removing ‘/home/kenneth/R/x86_64-pc-linux-gnu-library/4.1/pdftools’
- restoring previous ‘/home/kenneth/R/x86_64-pc-linux-gnu-library/4.1/pdftools’
The downloaded source packages are in
‘/tmp/Rtmp8BrZD7/downloaded_packages’
Warning message:
In install.packages(c("pdftools")) :
installation of package ‘pdftools’ had non-zero exit status
Any workaround?
This is my platform:
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTSMatrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3locale:
[1] LC_CTYPE=es_CO.UTF-8 LC_NUMERIC=C
[3] LC_TIME=es_CO.UTF-8 LC_COLLATE=es_CO.UTF-8
[5] LC_MONETARY=es_CO.UTF-8 LC_MESSAGES=es_CO.UTF-8
[7] LC_PAPER=es_CO.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_CO.UTF-8 LC_IDENTIFICATION=Cattached base packages:
[1] stats graphics grDevices utils datasets methods baseloaded via a namespace (and not attached):
[1] compiler_4.1.2
Thank you very much for your help.
Kenneth
@krcabrer it works for me on ubuntu 20.04. Can you please show the full output of your installation log? You probably have multiple, conflicting versions of poppler installed on your machine.
Dear @jeroen: Following is the complete log of the procedure. I also uninstall and purge poppler libs and then I install them again. Only one version. And the issue continued...
- installing source package ‘pdftools’ ...
** package ‘pdftools’ successfully unpacked and MD5 sums checked
** using staged installation
Found pkg-config cflags and libs!
Using PKG_CFLAGS=-I/usr/local/include/poppler/cpp -I/usr/local/include/poppler
Using PKG_LIBS=-L/usr/local/lib -lpoppler-cpp
** libs
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I/usr/local/include/poppler/cpp -I/usr/local/include/poppler -I'/home/kenneth/R/x86_64-pc-linux-gnu-library/4.1/Rcpp/include' -fvisibility=hidden -fpic -g -O2 -fdebug-prefix-map=/build/r-base-i2PIHO/r-base-4.1.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c RcppExports.cpp -o RcppExports.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -I/usr/local/include/poppler/cpp -I/usr/local/include/poppler -I'/home/kenneth/R/x86_64-pc-linux-gnu-library/4.1/Rcpp/include' -fvisibility=hidden -fpic -g -O2 -fdebug-prefix-map=/build/r-base-i2PIHO/r-base-4.1.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c bindings.cpp -o bindings.o
g++ -std=gnu++11 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o pdftools.so RcppExports.o bindings.o -L/usr/local/lib -lpoppler-cpp -L/usr/lib/R/lib -lR
installing to /home/kenneth/R/x86_64-pc-linux-gnu-library/4.1/00LOCK-pdftools/00new/pdftools/libs
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘pdftools’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/kenneth/R/x86_64-pc-linux-gnu-library/4.1/00LOCK-pdftools/00new/pdftools/libs/pdftools.so':
/home/kenneth/R/x86_64-pc-linux-gnu-library/4.1/00LOCK-pdftools/00new/pdftools/libs/pdftools.so: undefined symbol: _ZNK7poppler8text_box13has_font_infoEv
Error: loading failed
Ejecución interrumpida
ERROR: loading failed- removing ‘/home/kenneth/R/x86_64-pc-linux-gnu-library/4.1/pdftools’
- restoring previous ‘/home/kenneth/R/x86_64-pc-linux-gnu-library/4.1/pdftools’
Warning in install.packages :
installation of package ‘pdftools’ had non-zero exit status
Thank you for your help.
Kenneth
Dear @jeroen, I found the solution. I use this ppa repository for poppler.
sudo add-apt-repository ppa:bzamecnik/poppler
Then I update and now the package compilation works fine.
It seems that the problem is about the poppler default version that was installed on the system.
Greetings from Medellín, Colombia, South America.
Kenneth
Thanks for releasing pdftools 3.1.0, which seems likely to fix the issue I posted on Mac and Windows.
However, I'd like to use this on Linux. Waiting until April and then upgrading to the newer version of Linux will be quite difficult for me. I'm several linux distro's behind 22.
If that's the way to go, I'll try when that happens. If there is anyway to not make pdftools depend on Linux version for this fix, that'd be great...but ultimately this isn't a dealbreaker for me. Thanks!
We could create a ppa with a newer version of poppler. What disto are you using?
Updated.
Hi Jeroen. Thanks for looking into this. I'm using this distro:
Distributor ID: Debian
Description: Debian GNU/Linux 11 (bullseye)
Release: 11
Codename: bullseye
I imagine that a ppa isn't really a longterm solution. If I wait until Apr. should the fix you suggested earlier work?
Hello again. I updated pdftools on Mac and the PDF mentioned in the first post of this thread now renders as expected on my Mac.
However, it doesn't render as expected on shinyapps.io. Any idea how to make it work there? @jeroen mentioned above that updating poppler may be tricky for ubuntu (which is what I think is used by shinyapps.io).