/normcontrol-Document-Parser

Primary LanguagePythonApache License 2.0Apache-2.0

PyOPDParse

Your logo

python license issueo issuec

The purpose of the project

PyOPDParse is a library written in Python that provides a set of classes to extract elements and attributes from ODT, PDF and DOCX files. As a result, you always get a single structure of elements and their properties.

Table of Contents

Core features

  • parser of structural elements of PDF documents,
  • parser of structural elements of ODT documents,
  • parser of structural elements of DOCX documents,
  • unified classes of structural elements for documents of the specified formats.

Installation

in dev

Examples

in dev

Project Structure

PyOPDParse/
├── README.md
├── LICENSE.md
├── requirements.txt
├── src/
│   ├── classes/
│       ├── interfaces/
│           ├── InformalParserInterface.py
│       ├── superclasses/
│           ├── StructuralElement.py
│       ├── Frame.py
│       ├── Image.py
│       ├── List.py
│       ├── Paragraph.py
│       ├── Table.py
│       ├── TableRow.py
│       ├── TableCell.py
│       ├── UnifiedDocumentView.py
│   ├── odt/
│       ├── elements/
│           ├── AutomaticStyleParser.py
│           ├── DefaultStyleParser.py
│           ├── RegularStyleParser.py
│           ├── ImageParser.py
│           ├── ListParser.py
│           ├── NodeParser.py
│           ├── ParagraphParser.py
│           ├── TableParser.py
│           ├── ODTDocument.py
│       ├── ODTParser.py
│   ├── pdf/
│       ├── pdfclasses/
│           ├── Line.py
│           ├── PDFParagraph.py
│       ├── PDFParser.py
│   ├── docx/
│       ├── DocxParagraphParser.py
│   ├── helpers/
├── examples/
├── docs/
├── tests/
└──   

Documentation

Current version available here

Getting started

in dev

License

in dev

Acknowledgments

The development team expresses its deep gratitude for the support provided to ITMO University.

Contacts

Your contacts. For example:

Authors

Viacheslav Martsinkevich

Vladislav Tereshchenko

Andrei Berezhkov

Galina Larionova