This Python script processes multiple PDF files in a specified directory, extracts the first page of each PDF, and converts it to an image (PNG format). The resulting images are saved in a designated output directory.
- Batch Processing: Automatically processes all PDF files in a given folder.
- First Page Extraction: Extracts and converts only the first page of each PDF to an image.
- Customizable Paths: Easily specify input and output directories.
Ensure you have the following Python packages installed:
pdf2image
Pillow
You can install them using pip:
pip install pdf2image pillow
The pdf2image
library requires poppler
to be installed on your system.
- Ubuntu/Debian:
sudo apt-get install poppler-utils
- MacOS (using Homebrew):
brew install poppler
-
Clone or Download the Script:
Save the script as
batch_pdf_to_image.py
. -
Run the Script:
Execute the script using Python:
python extract.py
Enter the path for the pdfs and out path. The script will process each PDF in the specified directory, extract the first page, and save it as an image in the output directory.
- The images will be saved with the same name as the PDF files, with the addition of
_page1.png
at the end. For example, if the PDF is namedexample.pdf
, the image will be saved asexample_page1.png
. - The script will print the path to each saved image once it has been processed.
- Missing PDF Files: Ensure that the
pdf_dir
path is correctly set and contains valid PDF files. - Poppler Not Found: If you encounter issues related to
poppler
, ensure it is installed and accessible in your system's PATH.
This script is provided under the MIT License. Feel free to use, modify, and distribute it as needed.
Author: Joseph Gakah
Date: 15 August 2024