Ge0rg3/requests-ip-rotator

Parallelization and shutting down only specified endpoints

hariravi opened this issue · 3 comments

Hi George,

Am becoming an avid user of your module, and have primarily rotating through the 4 US end points (and I have been starting + stopping these 4 gateways every 100 requests or so so my scraping pattern is not detected).

One issue I've run into is when running my scrapers in parallel ... I declare a gateway as follows:
gateway = ApiGateway("https://www.google.com", regions=["us-east-1", "us-east-2", "us-west-1", "us-west-2"])

And unfortunately when I call gateway.shutdown(), all gateways running in parallel associated with https://google.com are getting shutdown. Is there a way I can shutdown gateways by endpoint id (i.e. gateway.shutdown(endpoints=specific_endpoint_ids), and you actually return the endpoints from the start call)?

Running the below block will make my problem clear:

from requests_ip_rotator import ApiGateway
gateway = ApiGateway("https://www.google.com", regions=["us-east-1", "us-east-2", "us-west-1", "us-west-2"])
gateway2 = ApiGateway("https://www.google.com", regions=["us-east-1", "us-east-2", "us-west-1", "us-west-2"])
# Note all 8 endpoints will be deleted, could I specify a list of endpoints I'd like to delete? I think we could modify your shutdown function to optionally take a list of endpoints as an argument, and also pass that to the delete_gateway function?
gateway.shutdown() 

Thanks again, and happy new year!
Hari

Hi, just added some functionality that should help you with this and will be rolling it out in the next release (feel free to clone the repo if you need it asap), usage should be as follows:

from requests_ip_rotator import ApiGateway
gateway = ApiGateway("https://www.google.com", regions=["us-east-1", "us-east-2", "us-west-1", "us-west-2"])
gateway2 = ApiGateway("https://www.google.com", regions=["us-east-1", "us-east-2", "us-west-1", "us-west-2"])

endpoints = gateway.start()
endpoints_2 = gateway.start(force=True)

gateway.shutdown(endpoints)  # Will only shutdown endpoints created by gateway #1

Hope this helps, will close the issue when the release is published 😄

George brilliant thanks again!

Release finally made 👍