tabulapdf/tabula-java

CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar',

kdshreyas opened this issue · 4 comments

Summary of your issue

Refer: chezou/tabula-py#349

I encountered an issue while processing a PDF file where a specific page consistently triggers a "CalledProcessError" with the following command: ['java', '-Dfile.encoding=UTF8', '-jar']. This error disrupts the processing flow and prevents further execution.

CalledProcessError: Command '['java', '-Dfile.encoding=UTF8', '-jar', 'D:\Anaconda\envs\dev_env\lib\site-packages\tabula\tabula-1.0.5-jar-with-dependencies.jar', '--pages', '1', '--lattice', '--format', 'JSON'

test pdf to reproduce the issue:
test_pdf_output.pdf

Code to reproduce the error:

inputpdf = 'test_pdf_output.pdf'
page = 1
tables = tabula.read_pdf(inputpdf, pages = page, lattice = True, guess = False)
df = tables[0]

Expected behavior:
The command should execute successfully on the page of the PDF file, without encountering any errors.

Actual behavior:
The error "CalledProcessError" is encountered when processing the specified page within the PDF file.

chezou commented

This is something same as #218, lattice mode triggers the exception.

@kdshreyas Could you please update the issue, not copying your original issue, but referencing my minimal reproductive command and output? You should not use tabula-py template.

Hey @chezou,
I have updated the issue, but I am bit unsure what exactly to update in issue raised, please guide me.

chezou commented

This is the tabula-java repo. You should not describe tabula-py code.

This is the reproducible command for the issue:

$ java  -Dfile.encoding=UTF8 -jar tabula/tabula-1.0.5-jar-with-dependencies.jar --pages 1 --lattice ~/Downloads/test_pdf_output.pdf
Exception in thread "main" java.lang.IllegalArgumentException: lines must be orthogonal, vertical and horizontal
	at technology.tabula.Ruling.intersectionPoint(Ruling.java:214)
	at technology.tabula.Ruling.findIntersections(Ruling.java:378)
	at technology.tabula.extractors.SpreadsheetExtractionAlgorithm.findCells(SpreadsheetExtractionAlgorithm.java:134)
	at technology.tabula.extractors.SpreadsheetExtractionAlgorithm.extract(SpreadsheetExtractionAlgorithm.java:63)
	at technology.tabula.extractors.SpreadsheetExtractionAlgorithm.extract(SpreadsheetExtractionAlgorithm.java:41)
	at technology.tabula.CommandLineApp$TableExtractor.extractTablesSpreadsheet(CommandLineApp.java:452)
	at technology.tabula.CommandLineApp$TableExtractor.extractTables(CommandLineApp.java:410)
	at technology.tabula.CommandLineApp.extractFile(CommandLineApp.java:180)
	at technology.tabula.CommandLineApp.extractFileTables(CommandLineApp.java:124)
	at technology.tabula.CommandLineApp.extractTables(CommandLineApp.java:106)
	at technology.tabula.CommandLineApp.main(CommandLineApp.java:76)

When I remove the lattice option, it works.

$ java  -Dfile.encoding=UTF8 -jar tabula/tabula-1.0.5-jar-with-dependencies.jar --pages 1  ~/Downloads/test_pdf_output.pdf
"","Utah Medicaid Preferred Drug List - Effective April 1, 2023"
"",Quinolones
"",Last Brand
Preferred Drugs,Status Type Limits Mandatory 3-Month Additional Note
"",Update Required
Cipro suspension,Preferred Brand 02/01/10 Cipro susp
"ciprofloxacin 250, 500, 750mg Preferred",Generic 02/01/10
levofloxacin,Preferred Generic 02/01/16
moxifloxacin,Preferred Generic 01/01/21
"",Last Required Prior Brand
Non Preferred Drugs,Status Type Limits Additional Note
"",Update Authorization Form Required
Baxdela,Non Preferred Brand 10/01/17 Medication Coverage Exception
Cipro tablet,Non Preferred Brand 02/01/10 Medication Coverage Exception
ciprofloxacin 100mg tablet,Non Preferred Generic 01/01/22 Medication Coverage Exception
ciprofloxacin suspension,Non Preferred Generic 01/01/20 Medication Coverage Exception Cipro susp
ofloxacin tablet,Non Preferred Generic 02/01/10 Medication Coverage Exception
"",Tetracyclines
"",Last Brand
Preferred Drugs,Status Type Limits Mandatory 3-Month Additional Note
"",Update Required
doxycycline monohydrate,
"",Preferred Generic 01/01/20
"50, 100mg capsule",
doxycycline hyclate,
"",Preferred Generic 01/01/20
"50, 100mg",
minocycline,
"",Preferred Generic 01/01/20
"50, 75, 100mg capsule",
"",Last Required Prior Brand
Non Preferred Drugs,Status Type Limits Additional Note
"",Update Authorization Form Required
demeclocycline,Non Preferred Generic 01/01/20 Medication Coverage Exception
Doryx,Non Preferred Brand 01/01/20 Medication Coverage Exception
doxycycline (unless listed preferred),Non Preferred Generic 01/01/20 Medication Coverage Exception
Minocin,Non Preferred Brand 01/01/20 Medication Coverage Exception
minocycline ER capsule,Non Preferred Generic 12/01/22 Medication Coverage Exception
minocycline tablet,Non Preferred Generic 01/01/20 Medication Coverage Exception
Minolira,Non Preferred Brand 01/01/20 Medication Coverage Exception
Nuzyra,Non Preferred Brand 01/01/20 Medication Coverage Exception
Solodyn,Non Preferred Brand 01/01/20 Medication Coverage Exception
tetracycline,Non Preferred Generic 01/01/20 Medication Coverage Exception
Vibramycin,Non Preferred Brand 01/01/20 Medication Coverage Exception
Ximino,Non Preferred Brand 01/01/20 Medication Coverage Exception
"",Page 11 of 111

Hey @chezou,
I am not very aware about tabula-java, still thanks for the input it is very helpful.