document-data-extraction

There are 2 repositories under document-data-extraction topic.

  • docext

    An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)

    Language:Python1.7k
  • TWIX

    TWIX is an open-source data extraction tool that reconstructs structured data from documents at scale, accurately and at low cost, by inferring the shared underlying visual template across documents

    Language:Python204