extraction-engine

There are 9 repositories under extraction-engine topic.

  • tabulapdf/tabula-java

    Extract tables from PDF files

    Language:Java1.8k68330428
  • mlscraper

    lorey/mlscraper

    🤖 Scrape data from HTML websites automatically by just providing examples

    Language:Python1.3k183289
  • BobLd/tabula-sharp

    Extract tables from PDF files (port of tabula-java)

    Language:C#15981225
  • lum-ai/odinson

    Odinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.

    Language:Scala65813923
  • BobLd/camelot-sharp

    A C# library to extract tabular data from PDFs (port of camelot Python version using PdfPig).

    Language:C#31625
  • manhph2211/ICDAR2015

    ICDAR 2015 competition on robust reading :smile:

    Language:Python2101
  • dhrumil29796/Dalhousie_University_CSCI5408_DMWA

    All five assignments and the final group project is done in class CSCI5408(Data Management, Warehousing and Analytics) Summer 2021 of MACS at Dalhousie University.

    Language:Java1100
  • invana/web-parsers

    Simple, extendable HTML and XML data extraction engine using YAML configurations and some times pythonic functions.

    Language:Python122
  • ahmedlrashed/teststand-database-utility

    Created python utility to extract and transform data from TestStand SQL database schema into flat CSV files.

    Language:Python0100