How to use the same proxy for multiple URL requests?
windowshopr opened this issue · 5 comments
Not really an issue, but would love some input as I can't seem to figure out how to make it work.
My code (pseudo) looks something like this:
req_proxy = RequestProxy()
url_list = [www.example1.com, www.example2.com, www.example3.com, www.example4.com]
for url in url_list:
while True:
request = req_proxy.generate_proxied_request(url)
if request is not None:
(THE REST OF MY CODE IS HERE ONCE WE GET A GOOD RESPONDING PROXY)
break # Break out of While loop if we got a good response
continue # Move on to the next url in the for loop
I'm wondering if there's a way to use the same proxy for say, 2 items in my url_list before trying to request another one. In my main application I have a long list of url's and would like to re-use some good responding proxies for multiple url's before making a new request. How could I go about structuring this? Thanks a lot! (Or if there's documentation that I missed, point me in the right direction. Thanks!)
Hello @windowshopr
There is actually a sustain flag in the RequestProxy constructor that reuses the latest proxy as long as it does not timeout. Just use RequestProxy(sustain=True) to test it.
Cheers
Right on! I will work that in to my code. Thanks a lot!
@pgaref Thanks for the help, adding in the sustain=True worked, however I'm running into an issue where the script works for about 5 or 6 proxies, and then gets stuck:
2020-01-07 20:24:25,155 root DEBUG Using proxy: 45.76.43.163:8080 | PremProxy
So it's trying to to use the above proxy, but it doesn't seem to move on if it can't get a response from the proxy. So I hit my CTRL + C to stop the script and I get a traceback that looks like:
Traceback (most recent call last):
File "C:\Users\...\Python36\lib\site-packages\urllib3\connectionpool.py", line 380, in _make_request
httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\...\Python36\lib\site-packages\urllib3\contrib\pyopenssl.py", line 280, in recv_into
return self.connection.recv_into(*args, **kwargs)
File "C:\Users\...\Python36\lib\site-packages\OpenSSL\SSL.py", line 1814, in recv_into
self._raise_ssl_error(self._ssl, result)
File "C:\Users\...\Python36\lib\site-packages\OpenSSL\SSL.py", line 1614, in _raise_ssl_error
raise WantReadError()
OpenSSL.SSL.WantReadError
Any ideas on that? It only seemed to do this after I added in the sustain. Thanks!
Hey @windowshopr
The buffer response issue seems to be just a weird traceback introduced by Python 3 and not the real issue (check link) -- it seems more like an SSL issue with the particular proxy.
I would expect the issue to be as easy to fix as handling the Error raised by http request -- happy to help if you narrow it down.
Cheers
Thanks a lot. I think it might just be something weird happening on my end you're right. It doesn't seem to do it every time, so I'll play with some exception handling/may just implement a quick timer/while loop to force it to move on for now.
Thanks!