docker-compose启动的容器scrapyd和crawler会立即退出
Vickey-Wu opened this issue · 4 comments
Bug 描述 (Describe the bug)
清晰简短的描述你遇到的 Bug. (A clear and concise description of what the bug is.)
docker-compose启动的容器scrapyd和crawler会立即退出,lianjia在一段时间后也会退出,lianjia应该是爬去完毕退出
如何重现 (To Reproduce)
docker-compose up -d
docker logs -f lianjia
docker logs -f scrapyd
docker logs -f crawler
重现步骤 (Steps to reproduce the behavior):
root@ubuntu:/mnt/house-renting# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
16fae3c93371 house-renting/crawler "scrapy crawl 58" 2 hours ago Up 2 hours 58
473fa78fc6a6 house-renting/crawler "scrapy crawl lianjia" 2 hours ago Up 3 minutes lianjia
c1336d24f029 house-renting/crawler "scrapy crawl douban" 2 hours ago Up 2 hours douban
d81f4f5c9c5e house-renting/scrapyd "/bin/bash" 2 hours ago Exited (0) 3 minutes ago scrapyd
69660e516589 vickeywu/kibana-oss:6.3.2 "/docker-entrypoint.…" 2 hours ago Up 2 hours 0.0.0.0:5601->5601/tcp kibana
d88e85587d63 house-renting/crawler "/bin/bash" 2 hours ago Exited (0) 3 minutes ago crawler
8b1e03c93a95 redis "docker-entrypoint.s…" 2 hours ago Up 2 hours 0.0.0.0:6379->6379/tcp redis
2be0615aab21 vickeywu/elasticsearch-oss:6.4.1 "/usr/local/bin/dock…" 2 hours ago Up 2 hours 0.0.0.0:9200->9200/tcp, 9300/tcp elasticsearch
lianjia日志
2019-04-08 06:19:02 [scrapy.core.engine] INFO: Closing spider (finished)
2019-04-08 06:19:02 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 7988,
'downloader/request_count': 22,
'downloader/request_method_count/GET': 22,
'downloader/response_bytes': 404392,
'downloader/response_count': 22,
'downloader/response_status_count/200': 22,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2019, 4, 8, 6, 19, 2, 77559),
'item_dropped_count': 21,
'item_dropped_reasons_count/DropItem': 21,
'log_count/INFO': 33,
'log_count/WARNING': 21,
'memusage/max': 62763008,
'memusage/startup': 56500224,
'request_depth_max': 1,
'response_received_count': 22,
'scheduler/dequeued': 22,
'scheduler/dequeued/memory': 22,
'scheduler/enqueued': 22,
'scheduler/enqueued/memory': 22,
'start_time': datetime.datetime(2019, 4, 8, 6, 14, 47, 67989)}
2019-04-08 06:19:02 [scrapy.core.engine] INFO: Spider closed (finished)
scrapyd和crawler用docker logs -f scrapyd
和docker logs -f crawler
无法看到日志
桌面环境 Desktop (please complete the following information)
- 操作系统(OS):
ubuntu16.04
如果是通过 Docker 运行: - Docker: 18.06.1-ce
- Docker-compose: 1.23.2, build 1110ad0
@Vickey-Wu 刚刚确认了 58 和豆瓣的反扒机制可能升级了无法爬取数据会立刻退出, 所以应该只有链家可以爬取到数据
@Vickey-Wu 刚刚确认了 58 和豆瓣的反扒机制可能升级了无法爬取数据会立刻退出, 所以应该只有链家可以爬取到数据
还有个问题:我按照wiki将spider_settings的两个文件都改成了深圳了,但爬取的数据还是大部分数据都是广州的,我进入对应爬虫容器看了,spider_settings里面的文件也的确是深圳了。请问还需要改啥呢?
- 本地项目配置:
root@ubuntu:/mnt/house-renting# grep -ri "深圳"
crawler/house_renting/spider_settings/a58.py:cities = (u'深圳',)
crawler/house_renting/spider_settings/a58.py: u'深圳',
crawler/house_renting/spider_settings/a58.py: u'深圳': 'http://sz.58.com/chuzu/',
crawler/house_renting/spider_settings/lianjia.py:cities = (u'深圳',)
crawler/house_renting/spider_settings/lianjia.py: u'上海', u'深圳', u'苏州', u'石家庄', u'沈阳',
crawler/house_renting/spider_settings/lianjia.py: u'上海': 'https://sh.lianjia.com/zufang/', u'深圳': 'https://sz.lianjia.com/zufang/',
- 进入58爬虫容器:
root@16fae3c93371:/house-renting/crawler/house_renting/spider_settings# cat a58.py |grep "cities"
# 只需要在这个列表中添加以下 available_cities 中的城市, 如果只需要扒取一个城市也需要使用一个括号包围, 如 (u'广州',)
cities = (u'深圳',)
available_cities = (
available_cities_map = {
@Vickey-Wu 刚刚确认了 58 和豆瓣的反扒机制可能升级了无法爬取数据会立刻退出, 所以应该只有链家可以爬取到数据
还有个问题:我按照wiki将spider_settings的两个文件都改成了深圳了,但爬取的数据还是大部分数据都是广州的,我进入对应爬虫容器看了,spider_settings里面的文件也的确是深圳了。请问还需要改啥呢?
- 本地项目配置:
root@ubuntu:/mnt/house-renting# grep -ri "深圳" crawler/house_renting/spider_settings/a58.py:cities = (u'深圳',) crawler/house_renting/spider_settings/a58.py: u'深圳', crawler/house_renting/spider_settings/a58.py: u'深圳': 'http://sz.58.com/chuzu/', crawler/house_renting/spider_settings/lianjia.py:cities = (u'深圳',) crawler/house_renting/spider_settings/lianjia.py: u'上海', u'深圳', u'苏州', u'石家庄', u'沈阳', crawler/house_renting/spider_settings/lianjia.py: u'上海': 'https://sh.lianjia.com/zufang/', u'深圳': 'https://sz.lianjia.com/zufang/',
- 进入58爬虫容器:
root@16fae3c93371:/house-renting/crawler/house_renting/spider_settings# cat a58.py |grep "cities" # 只需要在这个列表中添加以下 available_cities 中的城市, 如果只需要扒取一个城市也需要使用一个括号包围, 如 (u'广州',) cities = (u'深圳',) available_cities = ( available_cities_map = {
修改完有重新build docker镜像吗?
@Vickey-Wu 刚刚确认了 58 和豆瓣的反扒机制可能升级了无法爬取数据会立刻退出, 所以应该只有链家可以爬取到数据
还有个问题:我按照wiki将spider_settings的两个文件都改成了深圳了,但爬取的数据还是大部分数据都是广州的,我进入对应爬虫容器看了,spider_settings里面的文件也的确是深圳了。请问还需要改啥呢?
- 本地项目配置:
root@ubuntu:/mnt/house-renting# grep -ri "深圳" crawler/house_renting/spider_settings/a58.py:cities = (u'深圳',) crawler/house_renting/spider_settings/a58.py: u'深圳', crawler/house_renting/spider_settings/a58.py: u'深圳': 'http://sz.58.com/chuzu/', crawler/house_renting/spider_settings/lianjia.py:cities = (u'深圳',) crawler/house_renting/spider_settings/lianjia.py: u'上海', u'深圳', u'苏州', u'石家庄', u'沈阳', crawler/house_renting/spider_settings/lianjia.py: u'上海': 'https://sh.lianjia.com/zufang/', u'深圳': 'https://sz.lianjia.com/zufang/',
- 进入58爬虫容器:
root@16fae3c93371:/house-renting/crawler/house_renting/spider_settings# cat a58.py |grep "cities" # 只需要在这个列表中添加以下 available_cities 中的城市, 如果只需要扒取一个城市也需要使用一个括号包围, 如 (u'广州',) cities = (u'深圳',) available_cities = ( available_cities_map = {
修改完有重新build docker镜像吗?
有的,用这个命令build的
docker-compose up --build -d