abenori/TeX2img

Issues with viewbox when generating EMF, and scaling issue when converting PDF to EMF on some machines

Jonathan-LeRoux opened this issue · 17 comments

I am using TeX2img as part of a PowerPoint add-in I am developing, IguanaTex, to enable vector graphics support.
I have been running into a couple of issues.

  1. Generated EMF has wrong viewbox and is slightly warped compared to PNG generated from same LaTeX source
    I tried generating EMF and PNG files from the same source. When inserting the EMF file in PowerPoint (or viewing with Inkscape), the file is cut at the bottom and on the right. The EMF also needs to be rescaled to match the PNG file, scaling by about .86 in the height direction and .98 in the width direction.

  2. When converting a PDF to EMF using pdfiumdraw, the size of the EMF depends, on some machines, on the resolution of the display
    I'm having a hard time understanding this one.
    On my desktop, regardless of the resolution, the EMF has the correct size, the same as the original PDF.
    On my laptop (which has a high-dpi display, and whose reported dpi is 192, in case that has some influence), the size of the EMF, as reported by either PowerPoint or Inkscape, depends on the resolution in an inverse linear relation. According to some experiments, the original PDF width can be obtained by multiplying the EMF width by ResolutionX / 836, and the height by ResolutionY / 476.
    For example, for a 3200x1600 resolution, I have to multiply by 3200/836 and 1600/476. I have no idea where these numbers come from.

I'm attaching a couple test files.
TeX2img_tests.zip

Thanks in advance for your help!
Jonathan

PS: 日本語で返事していただいても大丈夫です;)

Thanks for the report.

  1. Generated EMF has wrong viewbox and is slightly warped compared to PNG generated from same LaTeX source
  2. When converting a PDF to EMF using pdfiumdraw, the size of the EMF depends, on some machines, on the resolution of the display

TeX2img simply calls pdfiumdraw to create EMF file from PDF by

"C:\Path\to\pdfiumdraw.exe" --extent=50 --emf --transparent --pages=1 [TEMPNAME].pdf

so both problems should lie in pdfiumdraw.

But... this is a little complicated. I don't understand in detail, but I can give some comments:

When inserting the EMF file in PowerPoint (or viewing with Inkscape), the file is cut at the bottom and on the right.

I also noticed this behavior a year ago; it came up in the off-topic conversation in TeX2img for Mac in Japanese (the main subject of the issue is completely different). The viewbox of EMF seems to be software-dependent and machine-dependent. In my environment, the viewbox in PowerPoint is much smaller than that in Inkscape, so I always set right/bottom margins to "5" as a workaround. If I remember correctly, adjusting parameters inside pdfiumdraw (using option interface like --extent=<value> or --scale=<value>) "sometimes" helped.

The EMF also needs to be rescaled to match the PNG file, scaling by about .86 in the height direction and .98 in the width direction.

That could be another issue arising from the same root.

Thanks a lot for your answer, that helps a lot. I realized that I wasn't missing some arguments when calling pdfiumdraw, I'll add those in the next version.
Regarding the bounding box, you are right that using --scale seems to help. I got good results with --scale=20 (the default is 4, right?), the emf file wasn't cut, and the aspect ratio looked better.
Is there a way to specify pdfiumdraw's extent and scale arguments when calling tex2img.exe? If not, I may need to stop using tex2img.exe, compile the source as I usually do, and use pdfiumdraw only at the end.
Also, what is the conversion route when using latex (->DVI) in TeX2img with EMF output?

what is the conversion route when using latex (->DVI) in TeX2img with EMF output?

The conversion route inside TeX2img can be understood by separating it into two sections:

  • First, generate PDF from LaTeX source using whatever chain of engines (e.g. "pdflatex", "latex + dvipdfmx", "latex + dvips + ps2pdf" or "lualatex")
  • Then, generate some IMG from PDF. This routine is highly optimized to allow various PDF input (not only math formula but also complicated pictures using TikZ or PSTricks).

The detail (command-line calls and logs) can be shown in output window of TeX2img, so please have a look if you are interested.

Is there a way to specify pdfiumdraw's extent and scale arguments when calling tex2img.exe?

Currently there is no way. Instead, you can use tex2img.exe to create PDF, and then run pdfiumdraw.exe by yourself to create EMF. This way should work well in most cases, but the conversion routine will be a little different from the "official" implementation in tex2img.exe.

Consider following example using tikz:

%#! pdflatex
\documentclass{article}
\pagestyle{empty}
\usepackage{tikz}
\begin{document}
dashed line!
\begin{tikzpicture}
\draw[dashed](0,0)--(1,0);
\end{tikzpicture}
\end{document}

When the final output format is PDF, the output will be outlined text and "pure" dashed line. However, when the final output format is EMF, the intermediate PDF will be outlined text and "chopped" dashed line. The EMF format cannot draw dashed line well, so we chop the path using .dashpath Ghostscript operator (for detail, see doraTeX/TeX2img#68; in Japanese) before passing PDF to pdfiumdraw.exe. (This is mere a part of what I've said above "This routine is highly optimized")

I found some strange code in pdfiumdraw (which I don't understand why I wrote such a code...) With some modification, the situation seems to be better. The current one is: https://1drv.ms/u/s!AnAe-sROy6tPhso-ifYqsl-Y_bN8VA (I'll do more modifications later.)
I didn't check it with Office. In fact, I'm doubting that this is (also) caused by Office. Terada-san (who is the original developper of TeX2img) found the phenomenon (many years ago). At that time, emf is generated by pstoedit.exe, not by pdfiumdraw.exe. So it says that this phenomenon is not by pdfiumdraw. Also, he recommended to use margin options of TeX2img to solve this problem.

I got good results with --scale=20 (the default is 4, right?),

The default is 1.

Also, what is the conversion route when using latex (->DVI) in TeX2img with EMF output?

The attached file may be useful if you can read Japanese.

tex2img_generate.pdf

Could you try pdfiumdraw.exe at https://1drv.ms/u/s!AnAe-sROy6tPhso-ifYqsl-Y_bN8VA ?

I remove some mysterious codes. I also fixed another bug, probably this bug caused the second problem. I also added 2pt margin at the bottom and on the right.

Could you try pdfiumdraw.exe at https://1drv.ms/u/s!AnAe-sROy6tPhso-ifYqsl-Y_bN8VA ?

The emf output is cut on the right/bottom in my environment (confirmed with PowerPoint 2010). The input pdf: here, and the output emf: here.

(私の環境では、現行 2.0.1 の pdfiumdraw.exe のほうが切れる量が少ないように見えます…)

I changed: +2pt -> +4pt (the same URL). That may be better... I couldn't find the law of the length cut by Word, so this is not a solution to the root of the problem...

(うちではこんな感じで切れる量は減っています.上が今ので下が2.0.1.)
a

I changed: +2pt -> +4pt (the same URL). That may be better...

Seems better.

pdfiumdraw-00

(黒背景の上に、同じ PDF から3種類の pdfiumdraw.exe で作成した EMF を載せてみるとこんな感じです。いちばん上が現行、まんなかがさっき切れたもの、いちばん下が +4pt のもの。現行版で私は問題ないようなのですよね… ちなみにノートパソコンの画面解像度は 1280x800 です。)

Thanks for looking into this. I've tried the new version, but it does not really improve my situation.
My problem is not actually the cutting of the .emf file in itself: I "ungroup" the .emf file into a PPT shape, which uncovers the cut parts anyway. But it looks like the cutting is linked to the different rescaling in X and Y. So adding margins may not solve my problem, and may actually make it worse, because it could impact the scaling. I just tested by inserting into PPT the .emf files created with the two versions of pdfiumdraw, and the old (cut) one has the correct aspect ratio, while the new one with margin is off.
In my add-in, if I use --scale=100 and then scale down, it should not make a big difference. But I don't think that's a good behavior for people who want to insert .emf files in PPT (without using my add-in). I agree that's probably Office's fault though :)

One thing I don't understand is the structure of the EMF file: there seems to be a few extra rectangular paths that can be seen in Inkscape and/or PowerPoint. Some of them seem to correspond to various page boxes. The first (in the xml order) is a very large invisible rectangle (I can only see it in Inkscape), it's unclear where it comes from. The second is a rectangle that is visible in Inkscape (and can be made visible in PPT by setting its outline). There are also two more shapes that appear when ungrouping in PPT, and AutoShape which seems to correspond to the crop box (and what we see after cutting), and a Rectangle that is a bit bigger to the right and bottom than the AutoShape.
What are these for?

I also tried using pdfiumdraw while varying the screen resolution, and the size of the .emf file was still varying (on my laptop only, for whatever reason).
I also noticed that the behavior of pdfiumdraw was different between my desktop and laptop in terms of bounding box as well. With the official version of TeX2img and --scale=200, on the example .pdf I sent the other day, on my desktop I get the final character to touch exactly the bounding box on the right, while on my laptop the bounding box is a hair too short.

My problem is not actually the cutting of the .emf file in itself: I "ungroup" the .emf file into a PPT shape, which uncovers the cut parts anyway. But it looks like the cutting is linked to the different rescaling in X and Y. So adding margins may not solve my problem, and may actually make it worse, because it could impact the scaling. I just tested by inserting into PPT the .emf files created with the two versions of pdfiumdraw, and the old (cut) one has the correct aspect ratio, while the new one with margin is off.

Sorry, but I don't yet know the reason for strange aspect ratio.

if I use --scale=100 and then scale down

That is not safe enough as a default setting. Consider following example.

%#! pdflatex test.tex
\documentclass[a5paper]{article}
\usepackage[margin=5pt]{geometry}
\usepackage{lipsum}
\begin{document}
\lipsum[1-7]
\end{document}

With pdfiumdraw --scale=1 test.pdf, the output is ok. However, with pdfiumdraw --scale=3 test.pdf, the bottom of the image corresponding to the first page is not drawn (just white margin appears). When --scale=5, the rate of white margin increases. This is why we don't use --scale option by default (the default is --scale=1). This behavior is coming from PDFium; PDFium blows off objects outside the display area for efficiency, which is very reasonable as a pdf renderer but problematic as a pdf-to-emf converter.

Actually, pdfiumdraw addresses this issue to some extent (by setting an abnormally large canvas first), but when --scale option is used, the upper limit is easily exceeded.

I don't understand is the structure of the EMF file: there seems to be a few extra rectangular paths that can be seen in Inkscape and/or PowerPoint. Some of them seem to correspond to various page boxes.

It should come from the image conversion pathway inside TeX2img.exe. An easy-to-understand example may be "pdfcrop"-like process. It calculates the boundingbox of the original PDF using Ghostscript, and includes it to produce a new PDF (with the margins cropped). It means that the old PDF is actually included in the new PDF, so the original rectangular path is retained inside the nested structure. The another "very large invisible rectangle" may come from pdfiumdraw (= "abnormally large canvas" I said above), I guess.

I also tried using pdfiumdraw while varying the screen resolution, and the size of the .emf file was still varying (on my laptop only, for whatever reason).
I also noticed that the behavior of pdfiumdraw was different between my desktop and laptop in terms of bounding box as well.

The resolution-dependent behavior is already known, but I don't know the reason.

if I use --scale=100 and then scale down

That is not safe enough as a default setting.

Of course, if your add-in is meant for only small equations, it's free to use --scale=100 or something as a default yourself.

Thanks for your comments. I hadn't thought about the potential issue with large images being cut off, good point.
I think my current solution, which is to give the user manual scaling factors to get the output of TeX2img and pdfiumdraw back to the proper scaling for their own setup, is the best one at this point.
If we can figure out the rescaling and cropping issues in future versions, I can always revisit.
Thanks again for your help!

Sorry not to reply. I don't have so much time currently...

I just tested by inserting into PPT the .emf files created with the two versions of pdfiumdraw, and the old (cut) one has the correct aspect ratio, while the new one with margin is off.

Actually I don't understand this point. As far as I see (just visual check) the ratio of emf file seems to be correct. Is your EMF file wider or vertically longer?

One thing I don't understand is the structure of the EMF file

I guess, the biggest rectangle is the size of the reference device and the second one is Bounds. See https://msdn.microsoft.com/en-us/library/cc230725.aspx With pdfiumdraw, the reference device is the display who makes the EMF file. I don't know other rectangles...

I also tried using pdfiumdraw while varying the screen resolution, and the size of the .emf file was still varying (on my laptop only, for whatever reason).

I didn't do anything about it. Could you give me files such two files?

Hi,

I have a new related issue that was reported to me by an IguanaTex user. He is asking why the aspect ratio changes when generating "a" vs "aaa". This seems linked to pdfiumdraw as well. I'm attaching an example that can easily be reproduced with Tex2Img. Note that the same kind of issue occurs with the version using more margin mentioned above, but the discrepancy is reversed: "a" is narrower than an a in "aaa" in the version without extra margin (a.emf, aaa.emf), but fatter in the one with extra margin (a_2.emf, aaa_2.emf).

a_vs_aaa.zip

I found the issue in you example, thank you. I could also produce an example with my computer but I couldn't find the cause of the problem. I'll try to find the solution. Please wait patiently.

It's been a very long time, but I'm still having some issues when inserting EMF files generated by TeX2img in PowerPoint.
The issue seems to occur (sometimes only, which makes debugging that much harder) when "ungrouping" the inserted EMF into a PowerPoint shape.
Here is my post on StackOverflow explaining the issue.
Some comments/responses suggest that the issue may be related to the structure of the EMF file.
Does that ring a bell, by any chance?

Thank you. That's very detailed report. I'll read it and go back to our pdfiumdraw.