Fonts missing
Closed this issue ยท 13 comments
So the conversion is super quick and accurate for all images as far as I can tell. However, none of the text is readable. Instead it is a bunch of blocky characters. So it seems that the fonts are missing.
Any plans to support fonts properly?
Apologies for the delay on this - I've already noted this on #3 and I haven't yet found an appropriate configuration for pdf.js to recognise more fonts - feel free to dig around in their docs and let me know / submit a pull request if you find a solution.
I am thinking it is an issue with "canvas" not seeing the fonts. If the fonts are embedded in the PDF, it probably needs to be loaded into canvas.
I was getting the same issue. I changed line 100 in pdf-img-convert.js to set disableFontFace argument to true and I get a much better result when rendering text.
var loadingTask = pdfjs.getDocument({data: pdfData, disableFontFace:true});
Might be worth while breaking this argument out into the config option.
Reading a bit more it looks like the whole parameter can be removed as pdfjs should handle it automatically now in nodejs. mozilla/pdf.js#9778
var loadingTask = pdfjs.getDocument({data: pdfData});
That seems to work for me however I have not done extensive testing.
Once the project is updated, I can run a few tests and if that works push it out to a large audience doing conversions to see if it has an issue in real life.
Thanks for the help - I'll try a combination of removing the disableFontFace
arg and maybe trying a different version of node-canvas
which copes with fonts better.
So I've done some research and some playing around on this - I've found that, indeed, it recognises more characters with disableFontFace
removed. However pdf.js pre-processing struggles with fonts in certain circumstances resulting in the characters not being rendered at all (talked about extensively in this issue) - so it seems like I can't make conversion completely airtight without them fixing this.
I'll remove disableFontFace: false
from the code and I'll probably set it as an optional parameter on conversion.
I've released the proposed fix as a beta version so if you'd like to test it use:
npm install pdf-img-convert@1.0.4-beta.0
As amazing as this project is, unless fonts are reliable, it cannot be used in production. How about looking at PDF Lib instead?
I investigated using PDF-Lib before pdf.js - unfortunately it has even more limited rendering capabilities. I'll investigate making a pdf.js fork for this use-case because what I ideally need is font rendering which falls back on system fonts when they're not embedded properly in the PDF but it'll probably take quite a while.
I installed the pdf-img-convert@1.0.4-beta.0 version and it worked for me! Thanks!
Thanks, pdf-img-convert@1.0.4-beta.0
works for me as well! Will the fix be included in a stable release?