PDF reader using PDFBox.
Extract text and draw operators from PDF.
java -classpath pdfreader.jar TextDrawExtractor <pdf | directory>
- Character or draw operator
- Page number
- Misc...
Example:
[MOVE_TO] 1 575.9997 210.439
[LINE_TO] 1 199.9998 210.439
[LINE_TO] 1 199.9998 210.939
[LINE_TO] 1 575.9997 210.939
[FILL_PATH] 1
[MOVE_TO] 1 206.1354 509.0457
[LINE_TO] 1 274.9039 509.0457
[CLOSE_PATH] 1
[FILL_PATH] 1
R 1 200.0125 101.367004 5.743927 3.5999548 Helvetica 1.0 2.247972
E 1 205.71643 101.367004 5.3279266 3.5999548 Helvetica 1.0 2.247972
S 1 211.00436 101.367004 5.3279266 3.5999548 Helvetica 1.0 2.247972
E 1 216.2923 101.367004 5.3279266 3.5999548 Helvetica 1.0 2.247972
A 1 221.58023 101.367004 5.3279266 3.5999548 Helvetica 1.0 2.247972
R 1 226.77136 101.367004 5.743927 3.5999548 Helvetica 1.0 2.247972
C 1 232.47528 101.367004 5.7439423 3.5999548 Helvetica 1.0 2.247972
H 1 238.17921 101.367004 5.743927 3.5999548 Helvetica 1.0 2.247972
A 1 245.64952 101.367004 5.327942 3.5999548 Helvetica 1.0 2.247972
R 1 250.93745 101.367004 5.7439423 3.5999548 Helvetica 1.0 2.247972
T 1 256.6414 101.367004 4.9119263 3.5999548 Helvetica 1.0 2.247972
I 1 261.51334 101.367004 2.2479553 3.5999548 Helvetica 1.0 2.247972
C 1 263.7213 101.367004 5.7438965 3.5999548 Helvetica 1.0 2.247972
L 1 269.42523 101.367004 4.4159546 3.5999548 Helvetica 1.0 2.247972
E 1 273.71237 101.367004 5.327942 3.5999548 Helvetica 1.0 2.247972
Extract images from PDF.
java -classpath pdfreader.jar ImageExtractor <pdf | directory>
- Page number
- x
- y
- width
- height
Example:
3 106.02108 594.7168 94.08959 105.8508
3 254.82501 594.7168 93.23424 136.21608
3 402.77362 594.7168 100.5048 136.21608
4 147.50803 613.83466 93.875755 85.535995
4 253.21825 613.83466 105.636955 84.4668
4 370.68967 613.83466 105.636955 84.4668
6 313.12714 424.67242 236.412 141.8472