/textese

Suite of tools for easily prep documents, flatten pdf's etc.

Primary LanguagePythonOtherNOASSERTION

Textidy

Overview

Textidy is a powerful and easy-to-use Python package for preparing PDF files for various use cases. It streamlines your PDFs by flattening them, which reduces their complexity and makes them easier to work with. Use cases include document preparation for OCR, removing interactive elements, document archiving, preparing documents for printing, and more.

Installation

First, ensure you have Python 3.7+ installed on your system. You can verify this by running:

python3 --version

Next, you need to install Poetry, which is the package manager Textidy uses:

curl -sSL https://install.python-poetry.org | python3 -

For more detailed installation instructions for Poetry, refer to the official Poetry documentation.

You'll also need to install Poppler, which is a dependency for the pdf2image package:

On Ubuntu:

sudo apt-get install -y poppler-utils

On MacOS:

brew install poppler

Once you have Poetry and poppler-utils installed, you can install Textidy:

poetry install

This command creates a virtual environment and installs the necessary dependencies in it.

Usage

You can use Textidy from the command line:

flatten --input <input_filename> --output <output_filename> --dpi <dpi>

Replace <input_filename>, <output_filename>, and <dpi> with your desired input filename, output filename, and DPI setting (default is 150).

There's also an option to flatten all the PDFs in the current directory:

flatten --all

Development

If you want to make changes to the code and test them, you can install the package in editable mode. This means that changes to the source code will be immediately reflected in the installed package without needing to rebuild and reinstall:

make

Note that this command needs to be run in the virtual environment created by Poetry. You can activate it with poetry shell.

Licensing

This software is dual-licensed.

  1. For non-commercial use: This software is available under the terms of the GNU Affero General Public License (AGPL). See the LICENSE file and LICENCE.txt for the full text of this license.

  2. For commercial use: If you want to use this software for commercial purposes, you'll need a separate license. Please contact the author for more information.

Contact

For inquiries about commercial licensing, please contact me at:

Sergey Khalil license@sergeykhalil.com