KeyError Geometry on Textract queries
aarif1996 opened this issue · 3 comments
Traceback (most recent call last):
File "/home/ubuntu/sample.py", line 52, in textract_output
document = extractor.analyze_document(
File "/usr/local/lib/python3.8/site-packages/textractor/textractor.py", line 438, in analyze_document
document = response_parser.parse(response)
File "/usr/local/lib/python3.8/site-packages/textractor/parsers/response_parser.py", line 906, in parse
return parse_document_api_response(response)
File "/usr/local/lib/python3.8/site-packages/textractor/parsers/response_parser.py", line 770, in parse_document_api_response
queries = _create_query_objects(
File "/usr/local/lib/python3.8/site-packages/textractor/parsers/response_parser.py", line 381, in _create_query_objects
query_results = _create_query_result_objects(
File "/usr/local/lib/python3.8/site-packages/textractor/parsers/response_parser.py", line 419, in _create_query_result_objects
block["Geometry"]["BoundingBox"], spatial_object=page
KeyError: 'Geometry'
I have a similar issue. If I get a straight answer, I do have coordinates. E.g. : What is the title of this doc? page1
However, if I get 'interpreted' answers e.g. What are the standards of this doc, page1: I have geometry set on None
query is TBlock(geometry=None, id='d1a1bac6-8c00-4b8b-91ef-72ff7d3398d9', block_type='QUERY', relationships=[TRelationship(type='ANSWER', ids=['d3c0611d-a7ba-48ed-9d4a-031e64a3d4f3'])], confidence=None, text=None, column_index=None, column_span=None, entity_types=None, page=1,
row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=TQuery(text='what are the standards of the certified weight?', alias='tc_certified_shipping_standards'))
rels is TRelationship(type='ANSWER', ids=['d3c0611d-a7ba-48ed-9d4a-031e64a3d4f3'])
[TBlock(geometry=None, id='d3c0611d-a7ba-48ed-9d4a-031e64a3d4f3', block_type='QUERY_RESULT', relationships=None, confidence=43.0, text='GRS, GRS', column_index=None, column_span=None, entity_types=None, page=1, row_index=None, row_span=None, selection_status=None, text_type=None, custom=None, query=None)]
I have a quite big chunk of code depending on coordinates and for 5 months straight, I had no issue. I did check for having same other libraries related to Textract to the old version and tested on old git branches.
So, is this a new way Textract answers to questions?
@aarif1996 Your issue is with the textractor package, not the amazon-textract-response-parser.
@anyaovi : Does your text 'GRS, GRS' exist on the page or is it inferred? Queries may not include the coordinates when the text is inferred. You do not get an exception, correct?
I will close this one, aws-samples/amazon-textract-textractor#195 is the ticket for the KeyError: 'Geometry'