ol-th/pdf-img-convert.js

Fonts missing

Closed this issue ยท 13 comments

So the conversion is super quick and accurate for all images as far as I can tell. However, none of the text is readable. Instead it is a bunch of blocky characters. So it seems that the fonts are missing.

Any plans to support fonts properly?

ol-th commented

Apologies for the delay on this - I've already noted this on #3 and I haven't yet found an appropriate configuration for pdf.js to recognise more fonts - feel free to dig around in their docs and let me know / submit a pull request if you find a solution.

I am thinking it is an issue with "canvas" not seeing the fonts. If the fonts are embedded in the PDF, it probably needs to be loaded into canvas.

Using Fonts with Canvas

I was getting the same issue. I changed line 100 in pdf-img-convert.js to set disableFontFace argument to true and I get a much better result when rendering text.

var loadingTask = pdfjs.getDocument({data: pdfData, disableFontFace:true});

Might be worth while breaking this argument out into the config option.

Reading a bit more it looks like the whole parameter can be removed as pdfjs should handle it automatically now in nodejs. mozilla/pdf.js#9778

var loadingTask = pdfjs.getDocument({data: pdfData});

That seems to work for me however I have not done extensive testing.

Once the project is updated, I can run a few tests and if that works push it out to a large audience doing conversions to see if it has an issue in real life.

ol-th commented

Thanks for the help - I'll try a combination of removing the disableFontFace arg and maybe trying a different version of node-canvas which copes with fonts better.

ol-th commented

So I've done some research and some playing around on this - I've found that, indeed, it recognises more characters with disableFontFace removed. However pdf.js pre-processing struggles with fonts in certain circumstances resulting in the characters not being rendered at all (talked about extensively in this issue) - so it seems like I can't make conversion completely airtight without them fixing this.

I'll remove disableFontFace: false from the code and I'll probably set it as an optional parameter on conversion.

ol-th commented

I've released the proposed fix as a beta version so if you'd like to test it use:

npm install pdf-img-convert@1.0.4-beta.0

As amazing as this project is, unless fonts are reliable, it cannot be used in production. How about looking at PDF Lib instead?

ol-th commented

I investigated using PDF-Lib before pdf.js - unfortunately it has even more limited rendering capabilities. I'll investigate making a pdf.js fork for this use-case because what I ideally need is font rendering which falls back on system fonts when they're not embedded properly in the PDF but it'll probably take quite a while.

I installed the pdf-img-convert@1.0.4-beta.0 version and it worked for me! Thanks!

Thanks, pdf-img-convert@1.0.4-beta.0 works for me as well! Will the fix be included in a stable release?

ol-th commented

Closing this as it's a duplicate of #3. The fix above has been included in a recent release.