extraction-engine

There are 9 repositories under extraction-engine topic.

tabulapdf/tabula-java
Extract tables from PDF files
Language:Java1.9k 69 330432
lorey/mlscraper
🤖 Scrape data from HTML websites automatically by just providing examples
Language:Python1.3k 19 3290
BobLd/tabula-sharp
Extract tables from PDF files (port of tabula-java)
Language:C#163 8 1225
lum-ai/odinson
Odinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.
Language:Scala66 8 13923
BobLd/camelot-sharp
A C# library to extract tabular data from PDFs (port of camelot Python version using PdfPig).
Language:C#31 6 25
manhph2211/ICDAR2015
ICDAR 2015 competition on robust reading :smile:
Language:Python2 1 01
dhrumil29796/Dalhousie_University_CSCI5408_DMWA
All five assignments and the final group project is done in class CSCI5408(Data Management, Warehousing and Analytics) Summer 2021 of MACS at Dalhousie University.
Language:Java1 1 00
invana/web-parsers
Simple, extendable HTML and XML data extraction engine using YAML configurations and some times pythonic functions.
Language:Python1 2 2
ahmedlrashed/teststand-database-utility
Created python utility to extract and transform data from TestStand SQL database schema into flat CSV files.
Language:Python0 1 00