dannnylo/rtesseract

ImageNotSelected using online file

Closed this issue · 7 comments

Hello,

I'm trying to work with online files.
I tried to fetch distant file into a tempfile in aim that rtesseract could read words on it.
I tried using this code:
tmp_file = Tempfile.new(self.title) open(fileUrl, 'r:UTF-8') do |url_file| #fileUrl is a string tmp_file.write(url_file.read) end tmp_file.rewind begin RTesseract.new(self.title, command: 'tesseract_error', debug: true ).to_s rescue => e return e.inspect end

The result is an RTesseract::ImageNotSelectedError
I don't know if it's due to the fact i get to_s in a def method converted to json in a serializer but when i return image i got a formatted json with rmagick processor and a source.

Am I doing wrong omewhere?

Thanks

Helllo, also got an error

(byebug) image = RTesseract.new(xpath("//img/@src").first.to_s)
#<RTesseract:0x007f88c8c70170 @configuration=#<RTesseract::Configuration:0x007f88c8c70120 @processor="rmagick", @parent=#<RTesseract::Configuration:0x007f88c8c7ffa8 @processor="rmagick">, @command="tesseract", @lang=nil, @psm=nil, @tessdata_dir=nil, @user_words=nil, @user_patterns=nil, @debug=false, @options_cmd=[]>, @options={}, @points={}, @processor=RTesseract::Processor::RMagickProcessor, @value=nil, @pdf_path=nil, @source=#<Pathname:http://www.huckmaquinas.com.br/image/catalog/face.png>>
(byebug) image.to_s
*** RTesseract::ImageNotSelectedError Exception: RTesseract::ImageNotSelectedError

I, [2017-12-20 16:32:00 +0000#5201]  INFO -- : Crawler: req_enqueued=11, res_dequeued=2, res_handled=1, item_pipelined=0, item_processed=0, item_sent=0, idling=false
nil

Thank you

Hello,
you will need download the file first.

`
require 'rubygems'
require 'rtesseract'
require 'open-uri'
url = 'https://raw.githubusercontent.com/dannnylo/rtesseract/master/spec/images/test.png'

file = Tempfile.new(['image', File.extname(url)])
file.binmode
file.write open(url).read
file.flush

image = RTesseract.new(file)
image.to_s_without_spaces

`

@dannnylo In this case, still not works
rtesseract 2.2.0

(byebug) url = 'https://raw.githubusercontent.com/dannnylo/rtesseract/master/spec/images/test.png'
"https://raw.githubusercontent.com/dannnylo/rtesseract/master/spec/images/test.png"
(byebug) file = Tempfile.new(['image', File.extname(url)])
I, [2018-02-19 05:59:23 +0000#32457]  INFO -- : Crawler: req_enqueued=1, res_dequeued=1, res_handled=0, item_pipelined=0, item_processed=0, item_sent=0, idling=false
#<Tempfile:/tmp/image20180219-32457-bs105h.png>
(byebug) file.binmode
#<File:/tmp/image20180219-32457-bs105h.png>
(byebug) file.write open(url).read
I, [2018-02-19 05:59:36 +0000#32457]  INFO -- : Crawler: req_enqueued=1, res_dequeued=1, res_handled=0, item_pipelined=0, item_processed=0, item_sent=0, idling=false
4342
(byebug) file.flush
I, [2018-02-19 05:59:44 +0000#32457]  INFO -- : Crawler: req_enqueued=1, res_dequeued=1, res_handled=0, item_pipelined=0, item_processed=0, item_sent=0, idling=false
#<File:/tmp/image20180219-32457-bs105h.png>
(byebug) image = RTesseract.new(file)
*** LoadError Exception: cannot load such file -- RMagick

nil

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial

The require gem rmagick helped, but now another error.

(byebug) RTesseract.new(file.flush).to_s
I, [2018-02-19 11:44:20 +0500#19610]  INFO -- : Crawler: req_enqueued=1, res_dequeued=1, res_handled=0, item_pipelined=0, item_processed=0, item_sent=0, idling=false
*** RTesseract::ConversionError Exception: No such file or directory @ rb_sysopen - /tmp/1519022660.745394861.txt

nil
(byebug) file
#<Tempfile:/tmp/image20180219-19610-158m5lr.png>

The file is in the tmp folder

LoadError Exception: cannot load such file -- RMagick

You need to install RMagick.

sudo apt-get install imagemagick libmagickwand-dev
gem install rmagick

@dannnylo Perhaps you just read the original comment. I updated it before you answered.

could you send, please, the result of the your code:

require 'rubygems'
require 'rtesseract'
require 'open-uri'
url = 'https://raw.githubusercontent.com/dannnylo/rtesseract/master/spec/images/test.png'

file = Tempfile.new(['image', File.extname(url)])
file.binmode
file.write open(url).read
file.flush

image = RTesseract.new(file)
image.to_s_without_spaces

Thank you

This is really strange. The file exists and RTesseract sees it (@file_dest="/tmp/1519367276.044604741), but for some reason with the conversion there is an error

    require 'rubygems'
    require 'rtesseract'
    require 'open-uri'
    url = 'https://raw.githubusercontent.com/dannnylo/rtesseract/master/spec/images/test.png'

    file = Tempfile.new(['image', File.extname(url)])
    file.binmode
    file.write open(url).read
    file.flush

    image = RTesseract.new(file)
byebug

(byebug) image
#<RTesseract:0x00007f65383710c8 @configuration=#<RTesseract::Configuration:0x00007f6538371078 @processor="rmagick", @parent=#<RTesseract::Configuration:0x00007f6538370f10 @processor="rmagick">, @command="tesseract", @lang=nil, @psm=nil, @oem=nil, @tessdata_dir=nil, @user_words=nil, @user_patterns=nil, @debug=false, @options_cmd=[]>, @options={}, @points={}, @processor=RTesseract::Processor::RMagickProcessor, @value=nil, @pdf_path=nil, @source=#<Pathname:/tmp/image20180223-2498-j8qsfl.png>, @image=#<Tempfile:/tmp/20180223-2498-rarkaw.tif>, @file_dest="/tmp/1519367276.044604741">
(byebug) image.to_s
I, [2018-02-23 06:28:06 +0000#2498]  INFO -- : Crawler: req_enqueued=1, res_dequeued=1, res_handled=0, item_pipelined=0, item_processed=0, item_sent=0, idling=false
*** RTesseract::ConversionError Exception: No such file or directory @ rb_sysopen - /tmp/1519367286.5269431203.txt

nil
(byebug) image.to_s_without_spaces
I, [2018-02-23 06:29:32 +0000#2498]  INFO -- : Crawler: req_enqueued=1, res_dequeued=1, res_handled=0, item_pipelined=0, item_processed=0, item_sent=0, idling=false
*** RTesseract::ConversionError Exception: No such file or directory @ rb_sysopen - /tmp/1519367372.9677863755.txt

nil