codeshell/fpdm

Checkboxes Failing to Detect State for Valid PDFs

TristanHammat-AgilisIT opened this issue · 4 comments

The code seems to assume that the checkbox "on" definition is aways first in the "/AP" Appearance Dictionary. This causes the checkbox code to set the wrong state for any PDF where the "off" definition is before the "on" definition.

For example a PDF with this checkbox Appearance Dictionary

/AP 
<<
/D 
<<
/Off 14 0 R
/Yes 15 0 R
>>
/N 
<<
/Off 16 0 R
/Yes 17 0 R

Will generate the following troubling data in the fpdm [infos] => Array:

[checkbox_yes] => Off
[checkbox_no] => Yes

I have checked the ISO 32000 and older PDF 1.4 reference and there seems to be no requirement for the "on" definition to appear before the "off" definition which would explain why a lot of PDFs have the reverse definition layout. The references do, on the other hand, seem to both state that the off-state definition must be named "Off", so this might be a better way to tell which definition is which.

Here is my horrible, rushed, hacked, redundancy filled solution.
In fpdm.php from line 1859 I replaced this

                                        } elseif (($ap_line==$Counter-4)&&($ap_d_line==$Counter-2)&&($ap_d_yes=='')&&$this->extract_pdf_definition_value("name", $CurLine, $match)) {
                                            $ap_d_yes=$match[1];
                                            if ($verbose_parsing) {
                                                echo("<br>Object's checkbox_yes is '<i>$ap_d_yes</i>'");
                                            }
                                            $object["infos"]["checkbox_yes"]=$ap_d_yes;
                                        } elseif (($ap_line==$Counter-5)&&($ap_d_line==$Counter-3)&&($ap_d_no=='')&&$this->extract_pdf_definition_value("name", $CurLine, $match)) {
                                            $ap_d_no=$match[1];
                                            if ($verbose_parsing) {
                                                echo("<br>Object's checkbox_no is '<i>$ap_d_no</i>'");
                                            }
                                            $object["infos"]["checkbox_no"]=$ap_d_no;

With this:

                                        } elseif (($ap_line==$Counter-4)&&($ap_d_line==$Counter-2)&&$this->extract_pdf_definition_value("name", $CurLine, $match)) {
                                            $ap_d_first=$match[1];
                                            if($ap_d_first!="Off") {
                                                if ($verbose_parsing) {
                                                    echo("<br>Object's checkbox_yes is '<i>$ap_d_first</i>'");
                                                }
                                                $ap_d_yes=$ap_d_first;
                                                $object["infos"]["checkbox_yes"]=$ap_d_first;
                                            }
                                            else {
                                                if ($verbose_parsing) {
                                                    echo("<br>Object's checkbox_no is '<i>$ap_d_first</i>'");
                                                }
                                                $ap_d_no=$ap_d_first;
                                                $object["infos"]["checkbox_no"]=$ap_d_first;
                                            }
                                        } elseif (($ap_line==$Counter-5)&&($ap_d_line==$Counter-3)&&$this->extract_pdf_definition_value("name", $CurLine, $match)) {
                                            $ap_d_second=$match[1];
                                            if($ap_d_second!="Off") {
                                                if ($verbose_parsing) {
                                                    echo("<br>Object's checkbox_yes is '<i>$ap_d_second</i>'");
                                                }
                                                $ap_d_yes=$ap_d_second;
                                                $object["infos"]["checkbox_yes"]=$ap_d_second;
                                            }
                                            else {
                                                if ($verbose_parsing) {
                                                    echo("<br>Object's checkbox_no is '<i>$ap_d_second</i>'");
                                                }
                                                $ap_d_no=$ap_d_second;
                                                $object["infos"]["checkbox_no"]=$ap_d_second;
                                            }
                                        } 

Also, just another possible minor issue. It seems that the code is looking for the definitions in the Appearance Dictionary's optional "down appearance" (/D) instead of the required "normal appearance" (/N). This seems like it might also cause issues for some PDFs if they include the "normal appearance" definitions but not the optional "down appearance".

For anyone else with this issue, I have now forked this repo and added this rough fix along with a couple of other quick and dirty fixes. Performed some limited testing and so far all good.
https://github.com/TristanHammat-AgilisIT/fpdm/

For anyone else with this issue, I have now forked this repo and added this rough fix along with a couple of other quick and dirty fixes. Performed some limited testing and so far all good.
https://github.com/TristanHammat-AgilisIT/fpdm/

Hi, Always same problem

For anyone else with this issue, I have now forked this repo and added this rough fix along with a couple of other quick and dirty fixes. Performed some limited testing and so far all good.
https://github.com/TristanHammat-AgilisIT/fpdm/

Hi, Always same problem

Even when using my fork?

Hello, this is a solution which work for me.
On line 1878, I initialize in the object["infos"] like this :

             elseif (($as=='')&&$this->extract_pdf_definition_value("/AS", $CurLine, $match)) {
                  $as=$match[1];
                  $object["infos"]["checkbox_yes"] = "";
                  $object["infos"]["checkbox_no"] = "";
                  if ($verbose_parsing) {
                      echo("<br>Object's AS is '<i>$as</i>'");
                  }
                  $object["infos"]["checkbox_state"]=$as;
                  $object["infos"]["checkbox_state_line"]=$Counter;
              }

I add this lines :

$object["infos"]["checkbox_yes"] = "";
$object["infos"]["checkbox_no"] = "";

In order to do the change with the best practices, you can extends FPDM class, and redefines copy/paste the function parsePDFEntries() and make the update here.
After that use your new class instead of \FPDM

Example :

class FPDMupdate extends \FPDM
{

    function parsePDFEntries(&$lines){
             [...]
             elseif (($as=='')&&$this->extract_pdf_definition_value("/AS", $CurLine, $match)) {
                  $as=$match[1];
                  $object["infos"]["checkbox_yes"] = "";
                  $object["infos"]["checkbox_no"] = "";
                  if ($verbose_parsing) {
                      echo("<br>Object's AS is '<i>$as</i>'");
                  }
                  $object["infos"]["checkbox_state"]=$as;
                  $object["infos"]["checkbox_state_line"]=$Counter;
              }
             [...]
     }
}


$pdf = new FPDMupdate("template.pdf");