aws-samples/amazon-textract-response-parser

`merge_table` does not handle merge child table header columns properly

douglasqian opened this issue · 0 comments

Issue

The expectation from using merge_tables is that it will convert 2 tables like this:

Headers: 
C1 C2 C3
Rows:
R1
R2

Headers: 
R3
Rows:
R4

into:

Headers: 
C1 C2 C3
Rows:
R1
R2
R3
R4

But instead the output is

Headers: 
C1 C2 C3
R3
Rows:
R1
R2
R4

This is because the headers are identified based on the existence of COLUMN_HEADER in a block's EntityTypes field and the child table's top block keep this entity type even after merging. The solution is to simply drop the this value from EntityTypes if it's there while merging.