PyOPDParse is a library written in Python that provides a set of classes to extract elements and attributes from ODT, PDF and DOCX files. As a result, you always get a single structure of elements and their properties.
- Core features
- Installation
- Examples
- Project Structure
- Documentation
- Getting started
- License
- Acknowledgments
- Contacts
- Authors
- parser of structural elements of PDF documents,
- parser of structural elements of ODT documents,
- parser of structural elements of DOCX documents,
- unified classes of structural elements for documents of the specified formats.
in dev
in dev
PyOPDParse/
├── README.md
├── LICENSE.md
├── requirements.txt
├── src/
│ ├── classes/
│ ├── interfaces/
│ ├── InformalParserInterface.py
│ ├── superclasses/
│ ├── StructuralElement.py
│ ├── Frame.py
│ ├── Image.py
│ ├── List.py
│ ├── Paragraph.py
│ ├── Table.py
│ ├── TableRow.py
│ ├── TableCell.py
│ ├── UnifiedDocumentView.py
│ ├── odt/
│ ├── elements/
│ ├── AutomaticStyleParser.py
│ ├── DefaultStyleParser.py
│ ├── RegularStyleParser.py
│ ├── ImageParser.py
│ ├── ListParser.py
│ ├── NodeParser.py
│ ├── ParagraphParser.py
│ ├── TableParser.py
│ ├── ODTDocument.py
│ ├── ODTParser.py
│ ├── pdf/
│ ├── pdfclasses/
│ ├── Line.py
│ ├── PDFParagraph.py
│ ├── PDFParser.py
│ ├── docx/
│ ├── DocxParagraphParser.py
│ ├── helpers/
├── examples/
├── docs/
├── tests/
└──
Current version available here
in dev
in dev
The development team expresses its deep gratitude for the support provided to ITMO University.
Your contacts. For example:
- Telegram channel answering questions about project
- slavamarcin@yandex.ru
- vlad-tershch@yandex.ru