lorien/grab

spider: impossible to setup grab transport

ingvarbiz opened this issue · 2 comments

I want to configure CURLOPT_RESOLVE to specific IP address, so in create_grab_instance() I wrote:

...
g.setup_transport('pycurl')
g.transport.curl.setopt(pycurl.RESOLVE, ['api.somesite.com:443:{}'.format(ip)])
return g

When I call spider.run(), I get the following error:

ERROR:grab.spider.base_service:Spider Service Fatal Error
Traceback (most recent call last):
  File "/home/jetscraper/jet/lib/python3.5/site-packages/grab/spider/base_service.py", line 32, in wrapper
    callback(*args, **kwargs)
  File "/home/jetscraper/jet/lib/python3.5/site-packages/grab/spider/network_service/multicurl.py", line 159, in spawner_callback
    grab = self.spider.setup_grab_for_task(task)
  File "/home/jetscraper/jet/lib/python3.5/site-packages/grab/spider/base.py", line 553, in setup_grab_for_task
    grab.setup_transport(self.grab_transport_name)
  File "/home/jetscraper/jet/lib/python3.5/site-packages/grab/base.py", line 253, in setup_transport
    'Transport is already set up. Use'
grab.error.GrabMisuseError: Transport is already set up. Use setup_transport(..., reset=True) to explicitly setup new transport
Traceback (most recent call last):
  File "jet_get_dxm.py", line 119, in <module>
    scraper.run()
  File "/home/jetscraper/jet/lib/python3.5/site-packages/grab/spider/base.py", line 689, in run
    raise exc_info[1]

The problem is in grab/spider/base.py", line 553, in setup_grab_for_task:
grab.setup_transport(self.grab_transport_name)
So I had to comment out this string. Is it possible to configure CURLOPT_RESOLVE somewhere else?

Use update_grab_instance

I have tried it too before issue submission, and it doesn't work either. It's being called before setup_transport function.

def setup_grab_for_task(self, task):
        grab = self.create_grab_instance()
        if task.grab_config:
            grab.load_config(task.grab_config)
        else:
            grab.setup(url=task.url)

        # Generate new common headers
        grab.config['common_headers'] = grab.common_headers()
        self.update_grab_instance(grab)
        grab.setup_transport(self.grab_transport_name)
        return grab

Same error here:

Spider Service Fatal Error
Traceback (most recent call last):
  File "/home/jetscraper/tests/lib/python3.5/site-packages/grab/spider/base_service.py", line 32, in wrapper
    callback(*args, **kwargs)
  File "/home/jetscraper/tests/lib/python3.5/site-packages/grab/spider/network_service/multicurl.py", line 159, in spawner_callback
    grab = self.spider.setup_grab_for_task(task)
  File "/home/jetscraper/tests/lib/python3.5/site-packages/grab/spider/base.py", line 553, in setup_grab_for_task
    grab.setup_transport(self.grab_transport_name)
  File "/home/jetscraper/tests/lib/python3.5/site-packages/grab/base.py", line 253, in setup_transport
    'Transport is already set up. Use'
grab.error.GrabMisuseError: Transport is already set up. Use setup_transport(..., reset=True) to explicitly setup new transport
%Traceback (most recent call last):
  File "t2.py", line 93, in <module>
    s.run()
  File "/home/jetscraper/tests/lib/python3.5/site-packages/grab/spider/base.py", line 689, in run
    raise exc_info[1]
  File "/home/jetscraper/tests/lib/python3.5/site-packages/grab/spider/base_service.py", line 32, in wrapper
    callback(*args, **kwargs)
  File "/home/jetscraper/tests/lib/python3.5/site-packages/grab/spider/network_service/multicurl.py", line 159, in spawner_callback
    grab = self.spider.setup_grab_for_task(task)
  File "/home/jetscraper/tests/lib/python3.5/site-packages/grab/spider/base.py", line 553, in setup_grab_for_task
    grab.setup_transport(self.grab_transport_name)
  File "/home/jetscraper/tests/lib/python3.5/site-packages/grab/base.py", line 253, in setup_transport
    'Transport is already set up. Use'
grab.error.GrabMisuseError: Transport is already set up. Use setup_transport(..., reset=True) to explicitly setup new transport