Henryhaohao/Wenshu_Spider

每次爬到一定数量后就爬不了了,不知道其他人有没有这样的问题

Opened this issue · 5 comments

2019-01-02 15:08:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:09:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:10:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:11:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:12:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:13:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:14:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:15:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:16:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:17:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:18:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:19:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:20:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:21:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:22:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:23:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:24:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)
2019-01-02 15:25:21 [scrapy.extensions.logstats] INFO: Crawled 1293 pages (at 0 pages/min), scraped 585 items (at 0 items/min)

爬了200以后也不能爬了,

2019-01-02 16:42:32 [scrapy.core.scraper] ERROR: Spider error processing <GET http://wenshu.court.gov.cn/CreateContentJS/CreateContentJS.aspx?DocID=ddb5d9fb-2022-472e-aa17-b4f91e537da8> (referer: http://wenshu.court.gov.cn/List/ListContent)
Traceback (most recent call last):
File "d:\programdata\anaconda3\lib\site-packages\scrapy\utils\defer.py", line 102, in iter_errback
yield next(it)
File "d:\programdata\anaconda3\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 30, in process_spider_output
for x in result:
File "d:\programdata\anaconda3\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 339, in
return (_set_referer(r) for r in result or ())
File "d:\programdata\anaconda3\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in
return (r for r in result or () if _filter(r))
File "d:\programdata\anaconda3\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in
return (r for r in result or () if _filter(r))
File "D:\ProgramData\Github\Wenshu_Spider-master\Wenshu_Project\Wenshu\spiders\wenshu.py", line 108, in get_detail
content_1 = json.loads(re.search(r'JSON.stringify((.*?));$(document', html).group(1)) # 内容详情字典1
AttributeError: 'NoneType' object has no attribute 'group'
2019-01-02 16:42:39 [scrapy.core.engine] INFO: Closing spider (finished)

我也是遇到这个问题,我在网上百度了下,是不是这个原因:某些下载线程没有正常执行回调方法引起程序一直以为线程还未下载完成,参考:https://my.oschina.net/airship/blog/628765

我也是遇到这个问题,我在网上百度了下,是不是这个原因:某些下载线程没有正常执行回调方法引起程序一直以为线程还未下载完成,参考:https://my.oschina.net/airship/blog/628765

你有尝试添加然后解决这个问题了吗

我尝试了下,百度上说的那两个文件下不下来,我自己点开里面的链接,照着改了下timeout,发现还是一样的问题,我们加下微信,也许一起交流下能解决,15868194743