Python Modules containing all the important functionalities for Finalyca
Data Store: Data Collection SQLTable: Data Table DataSchema: Data Table Definition DataField: Data Field Definition
Type | = | != | > | < | >= | <= | has | in | between |
---|---|---|---|---|---|---|---|---|---|
BOOL | ✓ | ✓ | |||||||
TEXT | ✓ | ✓ | ✓ | ✓ | |||||
REF | ✓ | ✓ | ✓ | ||||||
INT | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
DECIMAL | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
TS | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
DATE | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
Rest of the field types are not supported.
Type | Count | Sum | Max | Min | Average | Std Deviation |
---|---|---|---|---|---|---|
BOOL | ✓ | |||||
TEXT | ✓ | |||||
REF | ✓ | |||||
INT | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
DECIMAL | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
TS | ✓ | ✓ | ✓ | |||
DATE | ✓ | ✓ | ✓ |
Rest of the field types are not supported.
Broadly pdf scraping libraries can be separated into 2 sections.
- low level libraries that allows extracting text with (x0, y0, x1, y1) rects
- high level libraries that uses low level libraries and gives easy to access function e.g. extract_table.
we are using pdf plumber for our use case.
pip install pdfplumber
Install ImageMagic and GhostScript.
On Windows, download the applications ImageMagic and GhostScript
On Ubuntu,
sudo apt-get install libmagickwand-dev
sudo apt-get install ghostscript
Fix for Image magic Add the following line in
/etc/ImageMagick-7/policy.xml
just before</policymap>
.
<policy domain="coder" rights="read | write" pattern="PDF" />
More PDF Libraries: https://cbrunet.net/python-poppler/usage.html#working-with-pages