Can't read PDF-file
FredrikBrandt opened this issue · 13 comments
Hi,
I have a PDF-file (version 1.7) which is working correctly.
I have another PDF-file (version 1.3 from Producer Amyuni PDF Converter version 5.0.0.3) which is not read at all (error code with: Error, this is not a valid PDF: ...) which is in fact a readable PDF-file.
What seems to be the problem?
I am using this call:
// Check the PDF-file for information
$uri = $dir.'/'.$fileName;
$pdf = new Pdf();
$pdfdata = $pdf->getPdfInfo($uri);
In my class:
...
public function getPdfInfo($uri = "") {
$error = '';
try{
$pdf = new PdfToText($uri);
} catch (Exception $e) {
$this->error = 'Caught exception: '. $e->getMessage(). "\n";
return false;
}
$pdftext = $pdf -> Text ;
(calling class/PdfToText.phpclass).
The error log will show:
[Text] => �����
�������
...
Please, can you help me out?
It is working with other PDF-files.
Hi,
I have another file not working either for:
%PDF-1.5
%���
3 0 obj
<< /Length 4 0 R
/Filter /FlateDecode
stream
x���oo�Ǒ.��������`��{�����7�^l�kd�ͽ@����#E�E9$e��;�]U��穪$�E���.=U3���xH�����w�����wo�.������%�߮��C���*v��.�zs�_���]�����1�/�9�8�ݝ��u�~>߿{{�^��O��z1�W�&�_�<�8�y�R�Ws�a�[,G}wz����T�p���-�X�Hi�����}w9\��������?������{��������Mî�_���)�n ]��?����R��R�o��4NWC��4,�Kp��O7�燇�k����5������w��r�9�=�������u�<L\3^�aIi������J]��Ͽ��?|��R����^��*��+1ī���X/zY�^}�^��������_��%]�����^��������?�i�軛���W^�!\��tc�\MK�:���������ή˺����+�����ܜ�?}8x �u��d5/�����_�/5{�4Ҽ���S����>}�p�?�iJWSMh�����6MZ��<���淟�n�(a�{�$5�Ġ
��1�w��y��s���X^����
...
Hi,
I tried to send a PDF-file to (Christian.vigh@wuthering-bytes.com), but it bounced back.
Where can I send it?
Regards,
/Fredrik.
I'm happy to take a look at it if you can send it over.
Thanks, the problem is the CID IDENTITY_H fonts.
With just using the unicode map on the font object you get around a third of the text out but the rest isn't mapped to characters properly.
I'm working on a change that will read CID font's CMAP which will hopefully make reading international PDF's much better.
Hi,
Faktura-1587.pdf
This is not working either.
When do you think the change will be done?
Regards,
/Fredrik.
Hi,
How is it going?
When do you think a solution can be available?
This file is not possible to read at all.
Faktura20541.pdf
I use this syntax.
The fist part is printed out, but if I do another printout after the function call,
it will not show.
// Check the PDF-file for information
$uri = $dir.'/'.$fileName;
$pdf = new Pdf();
//
error_log(print_r(array(
'uri' => $uri,
'pdf' => $pdf,
'' => ''
), true));
**$pdfdata = $pdf->getPdfInfo($uri);**
Otherwise the tool is great.
Regards,
/Fredrik
Hi,
Please, I need this urgently.
Can I atleast get an answer to when it is expected to be changed?
It is much appreciated :).
Regards,
/Fredrik.
Hi,
Maybe I am not using the complete files?
I am using:
class/PdfToText.phpclass
class/Maps/adobe-charsets.map
class/Maps/unicode-to-ansi.map
Do I also need the CIDTables-directory like class/CIDTables/.?
Btw:
I tried adding libraries:
class/CIDTables
class/contributions
class/FontMetrics
class/FormTemplates to the class-library without any effect.
Hey Fredrik,
We are still working out the best way to resolve the issues with CID fonts.
We've made a few changes to the fork on our github if you check that out you should get some information out of the PDF from the unicode map we process even with CID fonts.
No time-frame currently as this is very much a side project
Hi and thanks alot,
It almost suits my purpose.
Can this be adjusted little more?
I seem to get part of the invoice, but not the part that I want.
Great otherwise.
I actually only needs 2 parameters from the PDF-files.
One is the number of pages and the second if the text in the PDF contains
Invoice (Faktura) or Creditinvoice (Kreditfaktura).
Can this be maintained somehow?
Yes, this is solving my problems for now.
Thank you very much :).
Hi again,
I am having problem with this type of invoice, is it because of the qr-code?
It doesn't even load anything.
This line of code will not run correctly:
$pdf = new PdfToText($uri);.
The dropzone will respond with:
Server responded with 0 code.
Can this be fixed?
Here is the invoice.
Faktura20541.pdf