Is is possible to convert to SVG but keep text as text?
Dingo64 opened this issue · 12 comments
Is is possible to convert to SVG but keep text as text?
I thing "pdf2svg" is not able to do anything about that, it depends of Poppler or Cairo library
@RonanKER ,do you hava any code or configuration to show it ?
i am looking for the way to let pdf2svg keep text as text from google for a week ,but nothing useful for me ,can you help me ?
If you want to keep text in the SVG then your best bet is to use Inkscape. I'm fairly sure it can be used from the command line to automate the conversion with text (though I've never used it for automated PDF -> SVG, only manually). Be aware that text often moves around a bit (the kerning is often a little off) when converting from a PDF.
See https://inkscape.org/doc/inkscape-man.html for details on the Inkscape command line.
I have learned to use Inkscape for a week. as i know Inkscape can just convert pdf to svg for the first page.is this real?
this is bad news for me.@dawbarton
It can open any page when opening with the gui. If you want everything via the command line, you can simply use qpdf or pdftk to extract the page you want from the PDF as a single page and then use Inkscape. (Inkscape might be able to do page selection from the command line, I just don't know how.)
I got an old batch script from 2015 when I tryed it (with pdftk and inkscape) :
test_inkscape.txt
in the folder 'in' I put several pdf exemple/test files, and then i lunched several similar batch files to try several solutions (inkscape, pdf2svg, pdftron, poppler, ...) and then compare results.
If you can afford it, i think pdftron was the best, but i'm not sure it would preserve text as you wich.
could anyone hint me in the right direction to understand why neither cairo nor poppler preserve text during pdf to svg conversion (to find some workaround to force them to keep it)? Does this procedure have a name? Is it "text vectorization" by any chance?
By the way I've tried inkscape as well, but no luck. Libreoffice seemed to work, but it was extremely slow and created a large .svg file, which is very hard to open.
I'm not sure what the name is ("preserve text" would have been my guess). Inkscape is usually the best in recent years - I've not had any problems with the PDFs that I've given it recently. It might be worth running pdftotext on your PDF to see if it does actually contain any text.
After some research on PDFs in general I've realized that the problem was in the text being not a "regular text", but as part of "annotaton/comments" objects. These often get ignored when being imported and I believe that inkscape excluded them as well.