Low performance issue
lorien opened this issue · 1 comments
lorien commented
Sometimes performance of running Spider drops dramatically. Here is end of log of spider started with 100 threads. After six hours, when I take a look on it, it was processing about 1 url per 1/2 seconds. I've killed it with ^C
RPS: 0.59, user: 0.00 [error:grab-connection-error=166, error:grab-network-error=61, error:grab-timeout-error=398, integrity:DataNotValid=3, login=2, network-count-rejected=36, task-count-rejected=1, user=1264, user-not-found=387]
RPS: 0.59, user: 0.00 [error:grab-connection-error=166, error:grab-network-error=61, error:grab-timeout-error=398, integrity:DataNotValid=3, login=2, network-count-rejected=36, task-count-rejected=1, user=1264, user-not-found=387]
RPS: 0.00, user: 0.00 [error:grab-connection-error=166, error:grab-network-error=61, error:grab-timeout-error=399, integrity:DataNotValid=3, login=2, network-count-rejected=36, task-count-rejected=1, user=1264, user-not-found=387]
RPS: 0.98, user: 0.00 [error:grab-connection-error=166, error:grab-network-error=61, error:grab-timeout-error=399, integrity:DataNotValid=3, login=2, network-count-rejected=36, task-count-rejected=1, user=1264, user-not-found=387]
RPS: 0.00, user: 0.87 [error:grab-connection-error=166, error:grab-network-error=61, error:grab-timeout-error=399, integrity:DataNotValid=3, login=2, network-count-rejected=36, task-count-rejected=1, user=1265, user-not-found=387]
RPS: 0.87, user: 0.00 [error:grab-connection-error=166, error:grab-network-error=61, error:grab-timeout-error=399, integrity:DataNotValid=3, login=2, network-count-rejected=36, task-count-rejected=1, user=1265, user-not-found=387]
RPS: 0.48, user: 0.48 [error:grab-connection-error=166, error:grab-network-error=61, error:grab-timeout-error=400, integrity:DataNotValid=3, login=2, network-count-rejected=37, task-count-rejected=1, user=1266, user-not-found=387]
RPS: 0.48, user: 0.00 [error:grab-connection-error=166, error:grab-network-error=61, error:grab-timeout-error=400, integrity:DataNotValid=3, login=2, network-count-rejected=37, task-count-rejected=1, user=1266, user-not-found=387]
RPS: 0.60, user: 0.60 [error:grab-connection-error=166, error:grab-network-error=61, error:grab-timeout-error=401, integrity:DataNotValid=3, login=2, network-count-rejected=37, task-count-rejected=1, user=1267, user-not-found=387]
RPS: 0.60, user: 0.00 [error:grab-connection-error=166, error:grab-network-error=61, error:grab-timeout-error=401, integrity:DataNotValid=3, login=2, network-count-rejected=37, task-count-rejected=1, user=1267, user-not-found=387]
RPS: 0.00, user: 0.71 [error:grab-connection-error=166, error:grab-network-error=61, error:grab-timeout-error=401, integrity:DataNotValid=3, login=2, network-count-rejected=37, task-count-rejected=1, user=1268, user-not-found=387]
RPS: 0.71, user: 0.00 [error:grab-connection-error=166, error:grab-network-error=61, error:grab-timeout-error=401, integrity:DataNotValid=3, login=2, network-count-rejected=37, task-count-rejected=1, user=1268, user-not-found=387]
RPS: 0.99, user: 0.00 [error:grab-connection-error=167, error:grab-network-error=61, error:grab-timeout-error=401, integrity:DataNotValid=3, login=2, network-count-rejected=38, task-count-rejected=1, user=1268, user-not-found=389]
RPS: 0.23, user: 0.00 [error:grab-connection-error=168, error:grab-network-error=61, error:grab-timeout-error=401, integrity:DataNotValid=3, login=2, network-count-rejected=38, task-count-rejected=1, user=1268, user-not-found=389]
RPS: 0.56, user: 0.00 [error:grab-connection-error=168, error:grab-network-error=61, error:grab-timeout-error=401, integrity:DataNotValid=3, login=2, network-count-rejected=38, task-count-rejected=1, user=1268, user-not-found=390]
^CThe <grab.spider.task_generator_service.TaskGeneratorService object at 0x7f342d601048> has not stopped :(
The <grab.spider.parser_service.ParserService object at 0x7f342f8451d0> has not stopped :(
The <grab.spider.network_service.threaded.NetworkServiceThreaded object at 0x7f342d66dba8> has not stopped :(
RPS: 0.00, user: 0.00 [error:grab-connection-error=168, error:grab-network-error=61, error:grab-timeout-error=401, integrity:DataNotValid=3, login=2, network-count-rejected=38, task-count-rejected=1, user=1268, user-not-found=391]
Work done
------------ Stats: ------------
Counters:
user-not-found: 391
user: 1268
spider:upload-size: 0.0
spider:task-user-network: 2489
spider:task-user-initial: 1699
spider:task-user: 2293
spider:task: 2293
spider:request-processed: 2293
spider:request-network: 2489
spider:request: 2293
spider:download-size: 57965269.0
parser:handler-processed: 1662
login: 2
integrity:DataNotValid: 3
error:grab-timeout-error: 401
error:grab-network-error: 61
error:grab-connection-error: 168
Lists:
network-count-rejected: 38
task-count-rejected: 1
Queue size: 0
Network streams: 200
Time elapsed: 1:25:22 (H:M:S)
End time: 11 Jul 2018, 06:38:29 UTC
lorien commented
It was caused by no_cursor_timeout=True
and snapshot modifier and long time of iteration over result set. Is not realted to Grab.