Incremental PDF Page Remover helps remove redundant pages in PDF files. Pages that are similar to the previous version (based on a similarity threshold) are considered redundant, and only the last version in any sequence is retained.
You can now adjust the similarity threshold using a slider, which can be set from 0% to 100%. The default threshold is set to 90%—if pages are more than 90% similar to the previous page, they will be removed.
main.py
: Runs the Streamlit web app.pdf_processor.py
: Contains the logic for removing redundant pages.scripts/
: Directory where uploaded PDFs are temporarily stored.
- Upload PDF: Use the Streamlit interface to upload a PDF.
- Set Threshold: Adjust the similarity threshold with the slider (default is 90%).
- Process & Download: Click the button to remove redundant pages and download the optimized PDF.
-
Clone the repository and navigate into the project:
git clone https://github.com/dennismstfc/Incremental-PDF-Page-Remover cd Incremental-PDF-Page-Remover
-
Create a virtual environment and activate it:
- For macOS/Linux:
python3 -m venv venv source venv/bin/activate
- For Windows:
python -m venv venv venv\Scripts\activate
-
Install the required packages:
pip install -r requirements.txt
-
Run the app:
streamlit run main.py
-
Upload PDF: Drag and drop a file.
-
Adjust Threshold & Process: Set the similarity threshold and click to remove redundant pages.
Made with ❤️ by Dennis Mustafić