/pdfreader

Primary LanguageJavaMIT LicenseMIT

pdfreader

PDF reader using PDFBox.

TextDrawExtractor

Extract text and draw operators from PDF.

java -classpath pdfreader.jar TextDrawExtractor <pdf | directory>
  1. Character or draw operator
  2. Page number
  3. Misc...

Example:

[MOVE_TO]	1   575.9997	210.439
[LINE_TO]	1   199.9998	210.439
[LINE_TO]	1   199.9998	210.939
[LINE_TO]	1   575.9997	210.939
[FILL_PATH] 1
[MOVE_TO]	1   206.1354	509.0457
[LINE_TO]	1   274.9039	509.0457
[CLOSE_PATH]    1
[FILL_PATH] 1
R	1   200.0125	101.367004	5.743927	3.5999548	Helvetica	1.0	2.247972
E	1   205.71643	101.367004	5.3279266	3.5999548	Helvetica	1.0	2.247972
S	1   211.00436	101.367004	5.3279266	3.5999548	Helvetica	1.0	2.247972
E	1   216.2923	101.367004	5.3279266	3.5999548	Helvetica	1.0	2.247972
A	1   221.58023	101.367004	5.3279266	3.5999548	Helvetica	1.0	2.247972
R	1   226.77136	101.367004	5.743927	3.5999548	Helvetica	1.0	2.247972
C	1   232.47528	101.367004	5.7439423	3.5999548	Helvetica	1.0	2.247972
H	1   238.17921	101.367004	5.743927	3.5999548	Helvetica	1.0	2.247972
A	1   245.64952	101.367004	5.327942	3.5999548	Helvetica	1.0	2.247972
R	1   250.93745	101.367004	5.7439423	3.5999548	Helvetica	1.0	2.247972
T	1   256.6414	101.367004	4.9119263	3.5999548	Helvetica	1.0	2.247972
I	1   261.51334	101.367004	2.2479553	3.5999548	Helvetica	1.0	2.247972
C	1   263.7213	101.367004	5.7438965	3.5999548	Helvetica	1.0	2.247972
L	1   269.42523	101.367004	4.4159546	3.5999548	Helvetica	1.0	2.247972
E	1   273.71237	101.367004	5.327942	3.5999548	Helvetica	1.0	2.247972

ImageExtractor

Extract images from PDF.

java -classpath pdfreader.jar ImageExtractor <pdf | directory>
  1. Page number
  2. x
  3. y
  4. width
  5. height

Example:

3	106.02108	594.7168	94.08959	105.8508
3	254.82501	594.7168	93.23424	136.21608
3	402.77362	594.7168	100.5048	136.21608
4	147.50803	613.83466	93.875755	85.535995
4	253.21825	613.83466	105.636955	84.4668
4	370.68967	613.83466	105.636955	84.4668
6	313.12714	424.67242	236.412	141.8472