finalyca_lib

Python Modules containing all the important functionalities for Finalyca

Data Store: Data Collection SQLTable: Data Table DataSchema: Data Table Definition DataField: Data Field Definition

Screener Query Builder

Type	=	!=	>	<	>=	<=	has	in	between
BOOL	✓	✓
TEXT	✓	✓					✓	✓
REF	✓	✓						✓
INT	✓	✓	✓	✓	✓	✓		✓	✓
DECIMAL	✓	✓	✓	✓	✓	✓		✓	✓
TS	✓	✓	✓	✓	✓	✓		✓	✓
DATE	✓	✓	✓	✓	✓	✓		✓	✓

Rest of the field types are not supported.

Type	Count	Sum	Max	Min	Average	Std Deviation
BOOL	✓
TEXT	✓
REF	✓
INT	✓	✓	✓	✓	✓	✓
DECIMAL	✓	✓	✓	✓	✓	✓
TS	✓		✓	✓
DATE	✓		✓	✓

Rest of the field types are not supported.

Broadly pdf scraping libraries can be separated into 2 sections.

low level libraries that allows extracting text with (x0, y0, x1, y1) rects
high level libraries that uses low level libraries and gives easy to access function e.g. extract_table.

we are using pdf plumber for our use case.

pip install pdfplumber

Install ImageMagic and GhostScript.

On Windows, download the applications ImageMagic and GhostScript

On Ubuntu,

sudo apt-get install libmagickwand-dev
sudo apt-get install ghostscript

Fix for Image magic Add the following line in /etc/ImageMagick-7/policy.xml just before </policymap>.

<policy domain="coder" rights="read | write" pattern="PDF" />