coolwanglu/pdf2htmlEX

Replacing HTML with unknown characters from the original PDF makes them Times New Roman per default

mortenmoulder opened this issue · 0 comments

Problem

So I understand why this is happening. When I convert one of my PDFs to HTML, then change the characters around in the document, all the changes characters (THAT HAS NOT BEEN USED IN THE DOCUMENT), are all automatically turned into Times New Roman.

Example

As you can see, that W is Times New Roman, whereas the rest are Verdana. This happens because I haven't used a W in my document, so the compiled style/font (by pdf2htmlEX) doesn't know about the character.

Possible fix

If I simply do something like: ABCDEFGHIJKLMNOPQRSTUVWXYZ and put that into my document with a white background, it actually works pretty well. Only issue is other people can see this as well, if they select the text.

Is there a way to fix this?