wrseward/pdf-parser

Encoding error

roomoraaes opened this issue · 1 comments

Hey!
First of all I really appreciate you for this repository, it has helped me a lot.
So have a little bug here, after I convert PDF to TEXT I have a text with symbols, I think that is a encoding problem, I have tried fix it but i didn`t make.

Can you help me?

image

Hello!

I think you might be able to solve this by using the the pdftotext enc option as outlined in this answer

You should be able to quickly achieve that by using it in the constructor of the main class like so:

$parser = new \Wrseward\PdfParser\Pdf\PdfToTextParser('pdftotext -enc ASCII7');

Replacing ASCII7 with whatever encoding your file is using.