roach-php/core

You cant run multiple spiders

Opened this issue · 1 comments

If you have build multiple spiders and try to run them together it creates concurency issues.
Assume you have a list of spider classes
foreach ($spiders as $spider) {
Roach::startSpider($spider);
}
I would expect each spider to be run once however the first spider is run once, the second 2 times, the 3rd spider 3 times and so on....
Maybe Im doing something stupid or maybe this isnt how im suppose to run multiple spiders idk what the isssue is but ive been looking at the code and I have a suspision it has to do with how the engine starts new runbut im not sure.

Package versions

  • core: [3.0.0]

I have the same issue. I'm running 1 spider inside of a foreach loop. Because of that, I see multiple, duplicate requests being made. So by the 10th loop, I have 10 spiders running. Those 10 spiders seem to have the corresponding index amount of startUrls. So on the 10th loop, I have 10 spiders, spider 1 of 10 requests the link once, spider 2 of 10 requests the link twice, the third spider requests the link 3 times, etc etc. Even the RequestDeduplicationMiddleware doesn't seem to do anything.

I noticed if I start 2 different Spider classes, even with 2 separate set of URLs, multiple requests are made. So it seems every time a Roach::startSpider() is called, a new spider is created, but will listen to any overrides, such as startUrls.