TeamHG-Memex/scrapy-rotating-proxies

Cookiejar handling while crawling the same page after a ban

mouchh opened this issue ยท 1 comments

Hi there,

Thank you very much for such a good extension ๐Ÿ‘
There is only one little detail that I miss. It appears that the cookiejar used in the request is not attached anymore to the new request - when the first request is blacklisted.

For example, I'm using a spider that parses 3 urls (one after the other). It needs to keep its cookies set up at the first URL so that data is right at the very last URL.
I got wrong results in my items each time there is a ban during one of the 3 urls parsed.

I'll dig a bit more in your extension - trying to improve it by myself - but I may not be efficient enough yet with scrapy !

What do you think of this?

All right, the rotating proxy is handling cookiejar with copy() request function which is also copying meta values (cookiejar index included).

By enabling the COOKIES_DEBUG setting, I was able to confirm the right cookies were sent.

Sorry for this wrong issue.