This benchmark is about reading pure PDF files - notscanned documents and not documents that applied OCR.
Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
| Name |
Last PyPI Release |
License |
Version |
Dependencies |
| pypdfium2 |
2024-12-19 |
Apache-2.0 or BSD-3-Clause |
4.30.1 |
PDFium (Foxit/Google) |
| pdfminer.six |
2025-05-06 |
MIT/X |
20250506 |
|
| pdfplumber |
2025-06-12 |
MIT |
0.11.7 |
pdfminer.six |
| pdfrw |
2017-09-18 |
MIT |
0.4 |
|
| pdftotext |
- |
GPL |
0.86.1 |
build-essential libpoppler-cpp-dev pkg-config python3-dev |
| PyMuPDF |
2025-06-12 |
GNU AFFERO GPL 3.0 / Commerical |
1.26.1 |
MuPDF |
| pypdf |
2025-06-29 |
BSD 3-Clause |
5.7.0 |
|
| Tika |
2025-03-26 |
Apache v2 |
3.1.0 |
Apache Tika |
Text Extraction Speed
| # |
Library |
Average |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
| 1 |
PyMuPDF |
0.1s |
0.4s |
0.3s |
0.2s |
0.2s |
0.0s |
0.1s |
0.0s |
0.1s |
0.0s |
0.1s |
0.0s |
0.1s |
0.0s |
0.0s |
| 2 |
pypdfium2 |
0.1s |
0.5s |
0.3s |
0.2s |
0.2s |
0.0s |
0.1s |
0.0s |
0.0s |
0.0s |
0.1s |
0.0s |
0.0s |
0.0s |
0.0s |
| 3 |
Tika |
0.2s |
0.8s |
0.5s |
0.3s |
0.3s |
0.1s |
0.2s |
0.1s |
0.1s |
0.1s |
0.1s |
0.1s |
0.1s |
0.0s |
0.0s |
| 4 |
pdftotext |
0.3s |
0.7s |
0.9s |
0.2s |
0.8s |
0.1s |
0.3s |
0.4s |
0.1s |
0.1s |
0.2s |
0.1s |
0.1s |
0.0s |
0.0s |
| 5 |
pypdf |
3.5s |
26.2s |
6.4s |
6.8s |
3.3s |
0.9s |
1.6s |
0.6s |
0.6s |
0.5s |
0.8s |
0.6s |
0.6s |
0.5s |
0.3s |
| 6 |
pdfminer.six |
5.8s |
35.1s |
16.6s |
10.2s |
5.5s |
1.5s |
2.5s |
1.1s |
1.6s |
1.1s |
2.0s |
1.5s |
1.4s |
0.7s |
0.6s |
| 7 |
pdfplumber |
9.5s |
60.9s |
16.6s |
17.0s |
10.7s |
3.1s |
5.3s |
2.6s |
2.5s |
2.3s |
3.8s |
2.5s |
2.7s |
1.4s |
1.3s |
Image Extraction Speed
| # |
Library |
Average |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
| 1 |
PyMuPDF |
0.5s |
0.3s |
0.5s |
0.0s |
1.6s |
0.4s |
0.0s |
2.9s |
0.4s |
0.4s |
0.1s |
0.0s |
0.3s |
0.2s |
0.0s |
| 2 |
pypdfium2 |
1.1s |
1.2s |
1.8s |
0.0s |
3.3s |
0.9s |
0.2s |
5.1s |
0.7s |
0.6s |
0.4s |
0.0s |
0.5s |
0.2s |
0.0s |
| 3 |
pypdf |
4.2s |
21.6s |
6.1s |
5.7s |
11.8s |
1.3s |
0.6s |
6.5s |
1.2s |
1.2s |
0.8s |
0.2s |
0.9s |
0.2s |
0.2s |
| 4 |
pdfminer.six |
7.4s |
43.9s |
17.5s |
12.7s |
15.4s |
1.6s |
2.5s |
1.6s |
1.5s |
1.0s |
1.8s |
1.2s |
1.3s |
0.7s |
0.5s |
| # |
Library |
Average |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
| 1 |
pdfrw |
0.1s |
0.1s |
0.5s |
0.0s |
0.3s |
0.1s |
0.1s |
0.1s |
0.1s |
0.1s |
0.1s |
0.0s |
0.1s |
0.0s |
0.0s |
| 2 |
PyMuPDF |
0.2s |
0.4s |
0.6s |
0.2s |
0.4s |
0.1s |
0.1s |
0.1s |
0.1s |
0.1s |
0.1s |
0.0s |
0.1s |
0.0s |
0.0s |
| 3 |
pypdf |
0.5s |
0.6s |
2.0s |
0.4s |
1.1s |
0.2s |
0.3s |
0.3s |
0.3s |
0.2s |
0.3s |
0.1s |
0.6s |
0.1s |
0.1s |
| # |
Library |
Average |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
| 1 |
pypdf |
3.4MB |
2.5MB |
5.6MB |
1.6MB |
7.2MB |
2.7MB |
3.1MB |
15.4MB |
2.4MB |
1.3MB |
3.0MB |
0.3MB |
1.2MB |
0.8MB |
1.0MB |
| 2 |
pdfrw |
3.5MB |
2.5MB |
5.7MB |
1.6MB |
7.3MB |
2.7MB |
3.1MB |
15.4MB |
2.4MB |
1.3MB |
3.0MB |
0.3MB |
1.2MB |
0.8MB |
1.0MB |
| 3 |
PyMuPDF |
3.7MB |
2.7MB |
6.9MB |
1.7MB |
8.5MB |
2.8MB |
3.4MB |
15.5MB |
2.5MB |
1.4MB |
3.2MB |
0.3MB |
1.3MB |
0.9MB |
1.1MB |
Text Extraction Quality
| # |
Library |
Average |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
| 1 |
pypdfium2 |
97% |
99% |
97% |
94% |
99% |
98% |
96% |
99% |
99% |
99% |
99% |
98% |
78% |
99% |
99% |
| 2 |
pypdf |
96% |
99% |
95% |
93% |
98% |
99% |
96% |
97% |
99% |
99% |
99% |
99% |
78% |
100% |
99% |
| 3 |
PyMuPDF |
96% |
98% |
96% |
93% |
97% |
98% |
95% |
99% |
98% |
98% |
98% |
97% |
77% |
98% |
99% |
| 4 |
Tika |
95% |
99% |
98% |
92% |
97% |
98% |
96% |
93% |
97% |
98% |
93% |
98% |
73% |
98% |
96% |
| 5 |
pdftotext |
91% |
96% |
93% |
91% |
94% |
92% |
96% |
96% |
96% |
97% |
83% |
94% |
77% |
96% |
79% |
| 6 |
pdfminer.six |
89% |
95% |
79% |
86% |
92% |
86% |
93% |
95% |
93% |
92% |
92% |
93% |
71% |
98% |
86% |
| 7 |
pdfplumber |
75% |
94% |
84% |
68% |
97% |
61% |
93% |
61% |
89% |
57% |
59% |
67% |
58% |
98% |
67% |