pymupdf/PyMuPDF

`Page.find_tables` giving confidence of table validity

Closed this issue · 1 comments

Is your feature request related to a problem? Please describe.

I have seen cases with pymupdf==1.26.3 where page.find_tables returns junk such as this:

table

Where Table.to_markdown also gives junk:

Col1 Col2 Col3 Col4

Describe the solution you'd like

Page.find_tables giving some sort of "confidence" parameter that the table is actually legit.

Describe alternatives you've considered

None.

Additional context

Related to #4616 as well.

Sorry, there is no way to implement this type of thing with the current table module.

What you are reporting are cases where parallel lines are drawn having a distance just a little large than the default value 3 which would have been contracted into one line otherwise.
Please have a look at the parameter list to see if larger values for e.g. join_tolerance and friends may help.