The y coordinate value of cell bbox seems to be inaccurate
qyhou opened this issue · 0 comments
qyhou commented
Thank you for providing the large-scale dataset.
When converting the html to a kind of split structure, I found the y coordinate value of cell bbox seems to be inaccurate.
eg. PMC5842743_009_00, which is a 11x6 table.
A03 line: [2, 65, 19, 76], [31, 65, 46, 76], [68, 65, 82, 76], [110, 65, 133, 76], [165, 65, 176, 76], 211, 65, 228, 76]
A04 line: [2, 78, 20, 89], [31, 78, 46, 89], [71, 75, 79, 90], [118, 75, 125, 90], [167, 75, 174, 90], [216, 75, 223, 90]
Obviously y1 of the upper cell is greater than y0 of the lower cell ( 76 > 75 ).
I randomly checked 100 tables in training set and discovered 37 instances have this peculiarity.
Thanks