PyOPDParse

The purpose of the project

PyOPDParse is a library written in Python that provides a set of classes to extract elements and attributes from ODT, PDF and DOCX files. As a result, you always get a single structure of elements and their properties.

Core features
Installation
Examples
Project Structure
Documentation
Getting started
License
Acknowledgments
Contacts
Authors

Core features

parser of structural elements of PDF documents,
parser of structural elements of ODT documents,
parser of structural elements of DOCX documents,
unified classes of structural elements for documents of the specified formats.

Installation

in dev

Examples

in dev

Project Structure

PyOPDParse/
├── README.md
├── LICENSE.md
├── requirements.txt
├── src/
│   ├── classes/
│       ├── interfaces/
│           ├── InformalParserInterface.py
│       ├── superclasses/
│           ├── StructuralElement.py
│       ├── Frame.py
│       ├── Image.py
│       ├── List.py
│       ├── Paragraph.py
│       ├── Table.py
│       ├── TableRow.py
│       ├── TableCell.py
│       ├── UnifiedDocumentView.py
│   ├── odt/
│       ├── elements/
│           ├── AutomaticStyleParser.py
│           ├── DefaultStyleParser.py
│           ├── RegularStyleParser.py
│           ├── ImageParser.py
│           ├── ListParser.py
│           ├── NodeParser.py
│           ├── ParagraphParser.py
│           ├── TableParser.py
│           ├── ODTDocument.py
│       ├── ODTParser.py
│   ├── pdf/
│       ├── pdfclasses/
│           ├── Line.py
│           ├── PDFParagraph.py
│       ├── PDFParser.py
│   ├── docx/
│       ├── DocxParagraphParser.py
│   ├── helpers/
├── examples/
├── docs/
├── tests/
└──