dannnylo/rtesseract

Use existing tiff

Closed this issue · 2 comments

I'm extracting all the pages of a PDF as TIFFs thru the mini_magick gem & I'd like to feed each of these to rtesseract w/o having it unnecessarily re-generate new, temporary tiffs. Short of monkey patching your image method, is there any way to do this?

mjy commented

@dshorthouse Did you try with .new('file.png, processor: 'mini_magick')? Completely a stab in the dark, (processor seems to be undocumented in README).

[EDIT]

I see, commit didn't land- a1cba3c

@mjy Hard to tell what's going on since the signature has changed quite a bit since end-2017. Indeed, I was using MiniMagick to extract pages of a PDF to separate tiff files + altering density, stripping out alpha layer, etc. prior to passing to rtesseract, which itself then again converted to tiff tempfile thru a method that now no longer exists in this newer version of rtesseract. Assuming a tempfile is no longer created thru

RTesseract::Command.new(source, 'stdout', options).run
then perhaps this is no longer an issue.