Cookiejar handling while crawling the same page after a ban

Question

Cookiejar handling while crawling the same page after a ban

mouchh opened this issue 7 years ago · 1 comments

Hi there,

Thank you very much for such a good extension 👍
There is only one little detail that I miss. It appears that the cookiejar used in the request is not attached anymore to the new request - when the first request is blacklisted.

For example, I'm using a spider that parses 3 urls (one after the other). It needs to keep its cookies set up at the first URL so that data is right at the very last URL.
I got wrong results in my items each time there is a ban during one of the 3 urls parsed.

I'll dig a bit more in your extension - trying to improve it by myself - but I may not be efficient enough yet with scrapy !

What do you think of this?

Answer 1 · 2017-11-27T21:42:40.000Z

All right, the rotating proxy is handling cookiejar with copy() request function which is also copying meta values (cookiejar index included).

By enabling the COOKIES_DEBUG setting, I was able to confirm the right cookies were sent.

Sorry for this wrong issue.