/random_proxy

Python package for fetching random proxies

Primary LanguagePythonMIT LicenseMIT

proxy_random

https://img.shields.io/pypi/v/proxy-random.svg?version=latest Documentation Status

Proxy Random is a tool to help small web scrapers. helping them prevent getting their ip banned from the target site.

default proxy lists are free-proxy-list.net and sslproxies.org but you can also add your desired proxy list websites and add a extractor for that website.

Installation

$ pip install proxy-random

Documentation

https://proxy-random.readthedocs.io/

Usage

here are some examples on how to use proxy-random.

Example 1:

import requests

from proxy_random import RandomProxy

# if you want to use a proxy to load the proxy lists
# use this instead(rp = RandomProxy(proxy='http://example.com:8080'))
rp = RandomProxy()
proxies = rp.extract_proxies() # you can also use rp.proxy_query.

# filter the US proxies which use port 80 or 443 and check if they work
# you can filter by multiple parameters at once or provide your own filter function(s)
workings = proxies.filter(port=[80, 443]) \
            .check_health(timeout=5).filter(working=True)

print(workings.random().url) # print a random working proxy

# or iterate through proxies and use them
for proxy in workings:
    # do something with the proxy
    requests.get("https://httpbin.org/ip", proxies={"http": proxy.url, "https": proxy.url})

ProxyQuery(s) are reusable so you can filter them as many times as needed.

here is another example of how to add custom providers

Example 2:

import requests
from bs4 import BeautifulSoup

from proxy_random import RandomProxy
from proxy_random.provider import Provider
from proxy_random.query import ProxyQuery
from proxy_random.proxy import Proxy

# you can also use RandomProxy(use_defaults=False) to disable default providers
rp = RandomProxy()
# add a custom provider

url = "https://free-proxy-list.net" # the url of the proxy list

# the function used to extract proxies from the url response
def extract_proxies(response: str) -> ProxyQuery:
    soup = BeautifulSoup(response, "html.parser")

    headings = [i.text.lower() for i in soup.find("thead").find_all("th")]

    rows = [[j.text for j in i] for i in soup.find("tbody").find_all("tr")]

    proxies = []
    for row in rows:
        proxy = Proxy()
        for i, name in enumerate(headings):
            if name == "ip address":
                proxy.ip = row[i]

            elif name == "port":
                proxy.port = int(row[i])

            elif name == "code":
                proxy.country_code = row[i]

            elif name == "last checked":
                proxy.last_checked = row[i]

            elif name in ("google", "https"):
                setattr(proxy, name, True if row[i] == "yes" else False)

            elif name in ("country", "anonymity"):
                setattr(proxy, name, row[i])

        proxies.append(proxy)

    return ProxyQuery(proxies)

# then create a new instance of the Provider class
provider = Provider(url=url, extractor=extract_proxies)
# then add the provider to the RandomProxy instance
rp.add_provider(provider)

# then extract the proxies like example 1
rp.extract_proxies()
...

My own usage of this package:

import requests

from proxy_random import RandomProxy

rp = RandomProxy(proxy="my proxy")
proxies = rp.extract_proxies()

workings = proxies.filter(custom_filters=[lambda x: x.country_code != "ir",]) \
            .limit(50).check_health(timeout=5).filter(working=True)


proxy = workings.random()

# use the proxy in some way
...

Refer to the documentation for more information about these classes.