大量的start to get None
jiajunhuang opened this issue · 11 comments
jiajunhuang commented
查询数据库发现没有爬取到数据
qinxuye commented
把cola目录下的data文件夹删掉再试试。
jiajunhuang commented
删除之后还是会这样, 错误代码:
get 1898353550 url: http://weibo.com/1898353550/follow
get 3211200050 url: http://weibo.com/3211200050/follow # 之前都没有出错, 但是好像老师这两个ID在重复出现
Error when fetch url: http://weibo.com/1898353550/follow
Error when get bundle: 1898353550
'NoneType' object has no attribute 'find'
Traceback (most recent call last):
File "/home/jiajun/Code/cola/cola/worker/loader.py", line 229, in _execute_bundle
**options).parse()
File "/home/jiajun/Code/cola/contrib/weibo/parsers.py", line 559, in parse
return self._error(url, e)
File "/home/jiajun/Code/cola/contrib/weibo/parsers.py", line 91, in _error
AttributeError: 'NoneType' object has no attribute 'find'
Finish 1898353550
start to get None
Error when fetch url: http://weibo.com/3211200050/follow
Error when get bundle: 3211200050
'NoneType' object has no attribute 'find'
Traceback (most recent call last):
File "/home/jiajun/Code/cola/cola/worker/loader.py", line 229, in _execute_bundle
**options).parse()
File "/home/jiajun/Code/cola/contrib/weibo/parsers.py", line 559, in parse
return self._error(url, e)
File "/home/jiajun/Code/cola/contrib/weibo/parsers.py", line 91, in _error
raise e
AttributeError: 'NoneType' object has no attribute 'find'
Finish 3211200050
start to get None
start to get None
start to get None
start to get None
start to get None
start to get None
start to get None
Finish visiting pages count: 32 # 这里我停止了
Finish visiting pages count: 32
qinxuye commented
你只访问了用户好友页面,有没有抓微博?
jiajunhuang commented
我是按照wiki里说的做的, python __init__.py
这只是访问了用户好友页面吗?抓微博需要另外的命令吗?
qinxuye commented
那就是没有其他设置了。
你先运行tests/contrib/test_weibo.py看看单元测试能不能通过。
jiajunhuang commented
可以的
self.browser.set_handle_gzip(True)
.......
----------------------------------------------------------------------
Ran 7 tests in 10.017s
OK
qinxuye commented
我测试了一下,好像确实有问题,不知道是不是这两个用户的页面出现了变化,今天我会看看是什么问题造成的。
jiajunhuang commented
好的, thank you ;-)
qinxuye commented
besides,如果只是用单机版本,最好使用develop分支,这个分支在单机上已经测试很久了,效率上还是稳定性上都比master要高得多,近期开发完成后我会把它merge到master分支。
qinxuye commented
这个问题是新版微博导致的,已经在master和develop分支下分别修复了,还是像之前说的,最好是用develop分支。
更新代码再试一下。
jiajunhuang commented
好的 ;-)