`Page.find_tables` giving confidence of table validity
Closed this issue · 1 comments
Is your feature request related to a problem? Please describe.
I have seen cases with pymupdf==1.26.3 where page.find_tables returns junk such as this:
Where Table.to_markdown also gives junk:
| Col1 | Col2 | Col3 | Col4 |
|---|---|---|---|
Describe the solution you'd like
Page.find_tables giving some sort of "confidence" parameter that the table is actually legit.
Describe alternatives you've considered
None.
Additional context
Related to #4616 as well.
Sorry, there is no way to implement this type of thing with the current table module.
What you are reporting are cases where parallel lines are drawn having a distance just a little large than the default value 3 which would have been contracted into one line otherwise.
Please have a look at the parameter list to see if larger values for e.g. join_tolerance and friends may help.