Error when passing Scrapy response
Closed this issue · 2 comments
ittailup commented
There seems to be a problem when trying to pass a Scrapy response to autopager. The same page works when using requests instead of Scrapy.
(ipython)➜ TweetScraper git:(master) ✗ scrapy shell http://elcomercio.pe/buscar/ppk
2016-04-09 23:06:27 [scrapy] INFO: Scrapy 1.0.5 started (bot: TweetScraper)
2016-04-09 23:06:27 [scrapy] INFO: Optional features available: ssl, http11, boto
2016-04-09 23:06:27 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'TweetScraper.spiders', 'LOG_LEVEL': 'INFO', 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'SPIDER_MODULES': ['TweetScraper.spiders'], 'BOT_NAME': 'TweetScraper', 'LOGSTATS_INTERVAL': 0, 'USER_AGENT': 'TweetScraper'}
2016-04-09 23:06:27 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, CoreStats, SpiderState
2016-04-09 23:06:27 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2016-04-09 23:06:27 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2016-04-09 23:06:27 [scrapy] INFO: Enabled item pipelines: SaveToFilePipeline
2016-04-09 23:06:27 [scrapy] INFO: Spider opened
[s] Available Scrapy objects:
[s] crawler <scrapy.crawler.Crawler object at 0x108a57090>
[s] item {}
[s] request <GET http://elcomercio.pe/buscar/ppk>
[s] response <200 http://elcomercio.pe/buscar/ppk>
[s] settings <scrapy.settings.Settings object at 0x109e7e250>
[s] spider <DefaultSpider 'default' at 0x10bdb1890>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
In [1]: import autopager
In [2]: autopager.select(response)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-355d1f012366> in <module>()
----> 1 autopager.select(response)
/Users/gabriel/.virtualenvs/ipython/lib/python2.7/site-packages/autopager/autopager.pyc in select(page, direct, prev, next)
36 By default, all link types are returned.
37 """
---> 38 return get_shared_autopager().select(page, direct, prev, next)
39
40
/Users/gabriel/.virtualenvs/ipython/lib/python2.7/site-packages/autopager/autopager.pyc in select(self, page, direct, prev, next)
96 """
97 links = self.extract(page, prev=prev, next=next, direct=direct)
---> 98 return parsel.SelectorList([x for y, x in links])
99
100 def extract(self, page, direct=True, prev=True, next=True):
/Users/gabriel/.virtualenvs/ipython/lib/python2.7/site-packages/autopager/autopager.pyc in extract(self, page, direct, prev, next)
110 sel = _any2selector(page)
111 links = get_links(sel)
--> 112 xseq = page_to_features(links)
113 yseq = self.crf.predict_single(xseq)
114 for x, y in zip(links, yseq):
/Users/gabriel/.virtualenvs/ipython/lib/python2.7/site-packages/autopager/model.pyc in page_to_features(xseq)
126
127 def page_to_features(xseq):
--> 128 features = [link_to_features(a) for a in xseq]
129
130 around = get_text_around_selector_list(xseq, max_length=15)
/Users/gabriel/.virtualenvs/ipython/lib/python2.7/site-packages/autopager/model.pyc in link_to_features(link)
60 )
61
---> 62 elem = link.root
63 elem_target = _elem_attr(elem, 'target')
64 elem_rel = _elem_attr(elem, 'rel')
AttributeError: 'Selector' object has no attribute 'root'
ittailup commented
I was successful by passing response through a selector object and sending this, extracted, rather than a response object.
In [16]: sel = Selector(response)
In [17]: autopager.urls(sel.extract())
Out[17]:
[u'http://elcomercio.pe/buscar/ppk/?start=15',
u'http://elcomercio.pe/buscar/ppk/?start=30']
kmike commented
Aha, your example works for me as-is (autopager.select(response)
) in Scrapy 1.1.0rc3 + Python 3.5 because Scrapy 1.1.0rc3 uses parsel
library. Scrapy 1.0.5 has selectors built-in, and there are some differences (.root
attribute is available as ._root
).