CroxxProxyPool
CroxxProxyPool is a light, thread-safe ProxyPool for Python that automatically crawl free proxies.
Advantages
- Thread-safe.
- Use a heap to make sure each time you will pop the earliest proxy you used.
- Confirm the availability of a proxy when it is crawled.
- Detect repeat proxies when pushing.
- [ Update on 2017-8-30 ] Crawl other proxies with a searched proxy to fasten searching available proxies ! ( Searching can be 2-5 times faster than before ! )
Usage
1.Import CroxxProxyPool
from CroxxProxyPool import ProxyPool
2.Get instance
pp = ProxyPool()
# get ProxyPool instance (ProxyPool is a singleton. You can only have ONE instance.)
3.Start crawling proxies
pp.start(delay = 10 * 60,ssl = True)
# start crawling proxies
4.Get a "http" or "https" proxy by pop()
proxy = pp.pop("HTTP") # or proxy = pp.pop("HTTPS")
# get a proxy ("HTTP" default)
5.Push the proxy back after using it
pp.push(proxy)
# push the proxy back to ProxyPool after using it
thread-safe
CroxxProxyPool is a thread-safe ProxyPool.
This is a multithreading emample.
(The log of function 'TsetProxy' is not thread-safe, for print in python2.7 is not thread-safe.)
from CroxxProxyPool import ProxyPool
import threading,time,random
pp = ProxyPool()
pp.start(delay = 10 * 60,ssl = True,debug = True)
def testThread(tid,pp):
s1 = random.randint(0,40)
time.sleep(s1)
proxy = pp.pop(debug=True)
s2 = random.randint(0,5)
time.sleep(s2)
pp.push(proxy,debug=True)
for i in range(0,300):
threading.Thread(target = testThread,args = (i,pp)).start()