aws-samples/amazon-textract-response-parser

(src-js) `ApiBlock` types are missing the `Page` property of type `number`

rob3c opened this issue · 1 comments

rob3c commented

All ApiBlock types (except ApiQueryBlock and ApiQueryResultBlock) in the javascript source are missing the Page property that's found on all blocks in the JSON response objects. The Python and C# libraries already include it, however.

Hi @rob3c, my initial observations on this one (referring to current working version test data sets):

  • On (older from around 2022) table-example-response.json, I see Page on all 84 blocks in the file (including PAGE, LINE, WORD, CELL, MERGED_CELL, TABLE, KEY, and VALUE)
  • Seems similarly universal in test-query-response.json (maybe from around 2023-06), test-multicol-response.json (2021-10ish), test-multicol-response-2.json (2021-10ish), and test-twocol-header-footer-response.json (2021-11ish)
  • BUT I'm not seeing it at all in financial-document-response.json, form1005-response.json, paystub-response.json (all generated around 2023-10 I think), or test-response.json (weirdly, seems to have been updated 2023-05)

Tentative hypothesis that maybe this field stopped getting reported at block level some time perhaps mid 2023? ...Unless you're still seeing it in more recent API results now? Or if @schadem can shed any more light on it.

Definitely agree it looks worth adding to the type model in some capacity, but e.g. maybe if it's disappeared already we can explicitly mark it as such in the JSDoc, and not attach it to block types for features that came out after it went away.