chezou/tabula-py

Unexpected table extraction

Closed this issue · 3 comments

Summary of your issue

Wrongly parsing two columns as one while they should be different columns.

I assume tabula-py is just a wrapped for tabula-java, correct? Yet I'm getting different table columns when comparing tabula-py and tabula-java UI results.

Check list before

  • Did you read FAQ?

  • (Optional, but really helpful) Your PDF URL: https://mopng.gov.in/files/petroleumStatistics/monthlyProduction/MPR-for-the-month-of-June,2022.pdf

  • Paste the output of import tabula; tabula.environment_info() on Python REPL:
    Python version:
    3.7.11 (default, Jul 27 2021, 09:42:29) [MSC v.1916 64 bit (AMD64)]
    Java version:
    openjdk version "11.0.6" 2020-01-14
    OpenJDK Runtime Environment (build 11.0.6+8-b765.1)
    OpenJDK 64-Bit Server VM (build 11.0.6+8-b765.1, mixed mode)
    tabula-py version: 2.4.0
    platform: Windows-10-10.0.22000-SP0
    uname:
    uname_result(system='Windows', node='DESKTOP-7T3U2OJ', release='10', version='10.0.22000', machine='AMD64', processor='Intel64 Family 6 Model 158 Stepping 13, GenuineIntel')
    linux_distribution: ('', '', '')
    mac_ver: ('', ('', '', ''), '')

What did you do when you faced the problem?

Checked result in the tabula-java UI.

Code:

dfs = tabula.read_pdf('https://mopng.gov.in/files/petroleumStatistics/monthlyProduction/MPR-for-the-month-of-June,2022.pdf', pages=9, lattice=True, pandas_options={'header': None})
df = dfs[0]

Expected behavior:

image

Target production during the month as second column and Month under review * as third.

Actual behavior:

image

Both Target production during the month and Month under review * in the second column.

Related Issues:

None

@alexDS12 this issue was automatically closed because it did not follow the issue template

@alexDS12 Were you able to solve the problem? Please help. I am also facing a similar issue.

@ollycredit I chose to move to another library as I didn't find a reliable solution, please create a new issue so author can try to help you.