Support conversion from and to Textract JSON
scottschreckengaust opened this issue · 4 comments
scottschreckengaust commented
Textract has an output results format in JSON.
https://docs.aws.amazon.com/textract/latest/dg/textract-dg.pdf
Specifically, the three types of analysis, https://docs.aws.amazon.com/textract/latest/dg/how-it-works-analyzing.html for the categories:
- text,
- forms, and
- tables
bertsky commented
Alas, the new converter is still incomplete, so
- forms, and
- tables
do not work yet. See slub/textract2page#2
bertsky commented
Update: tables work now, but the converter submodule needs to be updated here
kba commented
Update: tables work now, but the converter submodule needs to be updated here
I've updated the vendor submodules, including textract2page in #166. The tables
branch is not yet merged to master though and I think there are files missing to properly run the tests.