/PDFBox-php

PDF to txt. PHP façade for Apache's PDFBox

Primary LanguagePHP

#PDFBox-php Parsing text from PDF-files using the java utility PDFBox from Apache.

##Usage

###Basic This is a basic use case, found in demos/demo.php.

<?php

require_once '..'.DIRECTORY_SEPARATOR.'PDFBox.php';
require_once '..'.DIRECTORY_SEPARATOR.'PDFBox'.DIRECTORY_SEPARATOR.'ExtractText.php';

$jar = "pdfbox-app-1.4.0.jar";
$pdf_box = new PDFBox($jar);
$extract_text = new PDFBox\ExtractText($pdf_box);

$extract_text->parse('regular.pdf');

Input:

  • Path to jar
  • Path to pdf

Output:

  • The contents of "regular.pdf" located in "regular.txt" in the same folder

###Full API Read through the interface (public methods) of PDFBox\ExtractText. There are corresponding methods for every option available in the .jar.

##Requirements

  • java
  • PDFBox - one single .jar file necessary - and it must be executable
  • PHP 5.3 (for namespaces)

##License The bundled .jar is licensed under the Apache License, Version 2.0. The same goes for PDFBox-php:

Copyright 2011 Carl Helmertz

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.