运行scrapy crawler woaidu之后，卡住不动了

Question

运行scrapy crawler woaidu之后，卡住不动了

MRLuowen opened this issue 10 years ago · 5 comments

/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:12: ScrapyDeprecationWarning: woaidu_crawler.spiders.woaidu_detail_spider.WoaiduSpider inherits from deprecated class scrapy.spider.BaseSpider, please inherit from scrapy.spider.Spider. (warning only on first subclass, there may be others)
class WoaiduSpider(BaseSpider):
/usr/local/lib/python2.7/dist-packages/scrapy/contrib/pipeline/init.py:21: ScrapyDeprecationWarning: ITEM_PIPELINES defined as a list or a set is deprecated, switch to a dict
category=ScrapyDeprecationWarning, stacklevel=1)
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:19: ScrapyDeprecationWarning: scrapy.selector.HtmlXPathSelector is deprecated, instantiate scrapy.Selector instead.
response_selector = HtmlXPathSelector(response)
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:20: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
next_link = list_first_item(response_selector.select(u'//div[@Class="k2"]/div/a[text()="下一页"]/@href').extract())
/usr/local/lib/python2.7/dist-packages/scrapy/selector/unified.py:106: ScrapyDeprecationWarning: scrapy.selector.HtmlXPathSelector is deprecated, instantiate scrapy.Selector instead.
for x in result]
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:25: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
for detail_link in response_selector.select(u'//div[contains(@Class,"sousuolist")]/a/@href').extract():
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:33: ScrapyDeprecationWarning: scrapy.selector.HtmlXPathSelector is deprecated, instantiate scrapy.Selector instead.
response_selector = HtmlXPathSelector(response)
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:34: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
woaidu_item['book_name'] = list_first_item(response_selector.select('//div[@Class="zizida"][1]/text()').extract())
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:35: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
woaidu_item['author'] = [list_first_item(response_selector.select('//div[@Class="xiaoxiao"][1]/text()').extract())[5:].strip(),]
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:36: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
woaidu_item['book_description'] = list_first_item(response_selector.select('//div[@Class="lili"][1]/text()').extract()).strip()
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:37: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
woaidu_item['book_covor_image_url'] = list_first_item(response_selector.select('//div[@Class="hong"][1]/img/@src').extract())
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:40: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
for i in response_selector.select('//div[contains(@Class,"xiazai_xiao")]')[1:]:
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:46: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
list_first_item(i.select('./div')[0].select('./a/@href').extract()),
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:47: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
list_first_item(i.select('./div')[1].select('./a/@href').extract())
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:52: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
download_item['progress'] = list_first_item(i.select('./div')[2].select('./text()').extract())
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:53: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
download_item['update_time'] = list_first_item(i.select('./div')[3].select('./text()').extract())
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:56: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
list_first_item(i.select('./div')[4].select('./a/text()').extract()),
/home/lw/distribute_crawler-master/woaidu_crawler/woaidu_crawler/spiders/woaidu_detail_spider.py:57: ScrapyDeprecationWarning: Call to deprecated function select. Use .xpath() instead.
list_first_item(i.select('./div')[4].select('./a/@href').extract())\

Answer 1 · 2015-03-15T07:50:10.000Z

请问后来如何解决的？有方案吗？

Answer 2 · 2015-03-18T21:22:14.000Z

I have the same problems.

Answer 3 · 2015-03-18T21:24:22.000Z

image
iders.woaidu_detail_spider.WoaiduSpider inherits from deprecated class scrapy.spider.BaseSpider, please inherit from scrapy.spider.Spider. (warning only on first subclass, there may be others)
class WoaiduSpider(BaseSpider):
卡在这里不执行了，有解决方案么？

Answer 4 · 2015-03-20T00:15:14.000Z

follow this changelist,sync the code,it will be work normally:
https://github.com/gnemoug/distribute_crawler/pull/5/files

Answer 5 · 2016-07-21T07:03:19.000Z

@TylerzhangZC I change to branch pr/5 and run it,It still has the error：

Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
    cmd.run(args, opts)
  File "/Library/Python/2.7/site-packages/scrapy/commands/crawl.py", line 57, in run
    self.crawler_process.crawl(spname, **opts.spargs)
  File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 153, in crawl
    d = crawler.crawl(*args, **kwargs)
  File "/Library/Python/2.7/site-packages/twisted/internet/defer.py", line 1274, in unwindGenerator
    return _inlineCallbacks(None, gen, Deferred())
--- <exception caught here> ---
  File "/Library/Python/2.7/site-packages/twisted/internet/defer.py", line 1128, in _inlineCallbacks
    result = g.send(result)
  File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 71, in crawl
    self.engine = self._create_engine()
  File "/Library/Python/2.7/site-packages/scrapy/crawler.py", line 83, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "/Library/Python/2.7/site-packages/scrapy/core/engine.py", line 69, in __init__
    self.scraper = Scraper(crawler)
  File "/Library/Python/2.7/site-packages/scrapy/core/scraper.py", line 70, in __init__
    self.itemproc = itemproc_cls.from_crawler(crawler)
  File "/Library/Python/2.7/site-packages/scrapy/middleware.py", line 56, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "/Library/Python/2.7/site-packages/scrapy/middleware.py", line 32, in from_settings
    mwcls = load_object(clspath)
  File "/Library/Python/2.7/site-packages/scrapy/utils/misc.py", line 44, in load_object
    mod = import_module(module)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/importlib/__init__.py", line 37, in import_module
    __import__(name)
  File "/Users/georgezou/Documents/Coding/github/distribute_crawler/woaidu_crawler/woaidu_crawler/pipelines/cover_image.py", line 7, in <module>
    from scrapy.contrib.pipeline.images import ImagesPipeline
  File "/Library/Python/2.7/site-packages/scrapy/contrib/pipeline/images.py", line 7, in <module>
    from scrapy.pipelines.images import *
  File "/Library/Python/2.7/site-packages/scrapy/pipelines/images.py", line 15, in <module>
    from PIL import Image