/TableParser

Repo for "TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets" at SDU@AAAI-22

Primary LanguageJupyter Notebook

TableParser

Repo for "TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets" at SDU@AAAI-22

1. Clone repositories

Download and install a git client and clone this repository:

git clone git@github.com:DS3Lab/TableParser.git

into <git-home> directory. (home directory is denoted as git-home furtheron).

2. System components of TableParser

  • System overview of the TableParser pipeline

    This browser does not support PDFs. Please download the PDF to view it: the TableParser pipeline.

  • Model overview of Mask RCNN in DocParser

    This browser does not support PDFs. Please download the PDF to view it: Mask-RCNN.

    • TableAnnotator: refer to this repo.
      • Demo of annotating a table using TableAnnotator
    • ExcelAnnotator: ./ExcelAnnotator.
    • TableParser pipelines: ./TableParser.
    • Data: Download from this Google Drive link.
    • TableParser M1 (ModernTableParser) and M2 (HistoricalTableParser) can be downloaded from this Google Drive link, and put under ./TableParser/TableParser/detectron2/tools/docparser_outputs.

    3. References

    To cite TableParser, refer to these items:

    @inproceedings{rausch2021docparser,
      title={DocParser: Hierarchical Document Structure Parsing from Renderings},
      author={Rausch, Johannes and Martinez, Octavio and Bissig, Fabian and Zhang, Ce and Feuerriegel, Stefan},
      booktitle={35th AAAI Conference on Artificial Intelligence (AAAI-21)(virtual)},
      howpublished = {\url{https://arxiv.org/abs/1911.01702}},
      year={2021}
    }
    @inproceedings{rao2022tableparser,
      title={TableParser: Automatic Table Parsing with Weak Supervision from Spreadsheets},
      author={Rao, Susie Xi and Rausch, Johannes and Egger, Peter and Zhang, Ce},
      booktitle={Scientific Document Understanding Workshop (SDU{@}AAAI-22)(virtual)},
      howpublished = {\url{https://arxiv.org/abs/2201.01654}},
      year={2022}
    }