Pinned Repositories
ArtTabGen-Code
awesome-data-labeling
A curated list of awesome data labeling tools
Awesome-Table-Recognition
A curated list of resources dedicated to table recognition
BIG-bench-1
Beyond the Imitation Game collaborative benchmark for enormous language models
bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
CRASS-data-set
The data for the CRASS-benchmark. See: https://www.crass.ai for further information.
crass.ai-big-bench-contribution
doc-hcii2022-slides
Slides to our HCII 2022 talk on "Putting users in the loop: How User Research Can Guide AI Development for a Consumer-Oriented Self-service Portal". Imported from https://git.informatik.uni-leipzig.de/smarthec/doc-hcii2022-slides
docling
Docling bundles PDF document conversion to JSON and Markdown in an easy, self-contained package.
docquery
An easy way to extract information from documents
frankiert's Repositories
frankiert/ArtTabGen-Code
frankiert/awesome-data-labeling
A curated list of awesome data labeling tools
frankiert/Awesome-Table-Recognition
A curated list of resources dedicated to table recognition
frankiert/BIG-bench-1
Beyond the Imitation Game collaborative benchmark for enormous language models
frankiert/bonito
A lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
frankiert/CRASS-data-set
The data for the CRASS-benchmark. See: https://www.crass.ai for further information.
frankiert/crass.ai-big-bench-contribution
frankiert/doc-hcii2022-slides
Slides to our HCII 2022 talk on "Putting users in the loop: How User Research Can Guide AI Development for a Consumer-Oriented Self-service Portal". Imported from https://git.informatik.uni-leipzig.de/smarthec/doc-hcii2022-slides
frankiert/docling
Docling bundles PDF document conversion to JSON and Markdown in an easy, self-contained package.
frankiert/docquery
An easy way to extract information from documents
frankiert/DocumentLayoutAnalysis
Document Layout Analysis resources repos for development with PdfPig.
frankiert/GastCluster
A set of bash scripts to spread number crunching jobs across several machines and collect the results back into a single file
frankiert/layout-parser
A Python Library for Document Layout Understanding
frankiert/ocrd_segment
OCR-D-compliant page segmentation
frankiert/pdfix_sdk_example_cpp
Make PDF Files Accessible, Extract Data from PDF, Convert PDF to HTML, Fill-in PDF Form, Stamp PDF and more...
frankiert/pdfix_sdk_example_python
PDFix SDK samples for Python. PDF manipulation, content extraction, conversion , accessibility and more...
frankiert/PLIX
PLIX (Pipeline for Information Extraction) is a Python package and command line tool for information extraction from (PDF) documents.
frankiert/SciTSR
Table structure recognition dataset of the paper: Complicated Table Structure Recognition
frankiert/todo.md
TODO.md file format - todomd.org