/ProxyBroker

Proxy [Finder | Checker | Server]. HTTP(S) & SOCKS :performing_arts:

Primary LanguagePythonApache License 2.0Apache-2.0

ProxyBroker

image image image image image

ProxyBroker is an open source tool that asynchronously finds public proxies from multiple sources and concurrently checks them.

image

Features

  • Finds more than 7000 working proxies from ~50 sources.
  • Support protocols: HTTP(S), SOCKS4/5. Also CONNECT method to ports 80 and 23 (SMTP).
  • Proxies may be filtered by type, anonymity level, response time, country and status in DNSBL.
  • Work as a proxy server that distributes incoming requests to external proxies. With automatic proxy rotation.
  • All proxies are checked to support Cookies and Referer (and POST requests if required).
  • Automatically removes duplicate proxies.
  • Is asynchronous.

Docker

Docker Hub https://hub.docker.com/r/bluet/proxybroker2

$ docker run --rm bluet/proxybroker2 --help
  usage: proxybroker [--max-conn MAX_CONN] [--max-tries MAX_TRIES]
                     [--timeout SECONDS] [--judge JUDGES] [--provider PROVIDERS]
                     [--verify-ssl]
                     [--log [{NOTSET,DEBUG,INFO,WARNING,ERROR,CRITICAL}]]
                     [--min-queue MINIMUM_PROXIES_IN_QUEUE]
                     [--version] [--help]
                     {find,grab,serve,update-geo} ...

  Proxy [Finder | Checker | Server]

  Commands:
    These are common commands used in various situations

    {find,grab,serve,update-geo}
      find                Find and check proxies
      grab                Find proxies without a check
      serve               Run a local proxy server
      update-geo          Download and use a detailed GeoIP database

  Options:
    --max-conn MAX_CONN   The maximum number of concurrent checks of proxies
    --max-tries MAX_TRIES
                          The maximum number of attempts to check a proxy
    --timeout SECONDS, -t SECONDS
                          Timeout of a request in seconds. The default value is
                          8 seconds
    --judge JUDGES        Urls of pages that show HTTP headers and IP address
    --provider PROVIDERS  Urls of pages where to find proxies
    --verify-ssl, -ssl    Flag indicating whether to check the SSL certificates
    --min-queue MINIMUM_PROXIES_IN_QUEUE   The minimum number of proxies in the queue for checking connectivity
    --log [{NOTSET,DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                          Logging level
    --version, -v         Show program's version number and exit
    --help, -h            Show this help message and exit

  Run 'proxybroker <command> --help' for more information on a command.
  Suggestions and bug reports are greatly appreciated:
  <https://github.com/bluet/proxybroker2/issues>

Requirements

Installation

To install last stable release from pypi:

$ pip install proxybroker

The latest development version can be installed directly from GitHub:

$ pip install -U git+https://github.com/bluet/proxybroker2.git

Usage

CLI Examples

Find

Find and show 10 HTTP(S) proxies from United States with the high level of anonymity:

$ proxybroker find --types HTTP HTTPS --lvl High --countries US --strict -l 10

image

Grab

Find and save to a file 10 US proxies (without a check):

$ proxybroker grab --countries US --limit 10 --outfile ./proxies.txt

image

Serve

Run a local proxy server that distributes incoming requests to a pool of found HTTP(S) proxies with the high level of anonymity:

$ proxybroker serve --host 127.0.0.1 --port 8888 --types HTTP HTTPS --lvl High --min-queue 5

image

Run proxybroker --help for more information on the options available. Run proxybroker <command> --help for more information on a command.

Basic code example

Find and show 10 working HTTP(S) proxies:

import asyncio
from proxybroker import Broker

async def show(proxies):
    while True:
        proxy = await proxies.get()
        if proxy is None: break
        print('Found proxy: %s' % proxy)

proxies = asyncio.Queue()
broker = Broker(proxies)
tasks = asyncio.gather(
    broker.find(types=['HTTP', 'HTTPS'], limit=10),
    show(proxies))

loop = asyncio.get_event_loop()
loop.run_until_complete(tasks)

More examples.

Proxy information per requests

HTTP

Check X-Proxy-Info header in response.

$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://httpbin.org/get
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
> GET http://httpbin.org/get HTTP/1.1
> Host: httpbin.org
> User-Agent: curl/7.58.0
> Accept: */*
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 OK
< X-Proxy-Info: 174.138.42.112:8080
< Date: Mon, 04 May 2020 03:39:40 GMT
< Content-Type: application/json
< Content-Length: 304
< Server: gunicorn/19.9.0
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Credentials: true
< X-Cache: MISS from ADM-MANAGER
< X-Cache-Lookup: MISS from ADM-MANAGER:880
< Connection: keep-alive
<
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Cache-Control": "max-age=259200",
    "Host": "httpbin.org",
    "User-Agent": "curl/7.58.0",
    "X-Amzn-Trace-Id": "Root=1-5eaf8e7c-6a1162a1387a1743a49063f4"
  },
  "origin": "...",
  "url": "http://httpbin.org/get"
}
* Connection #0 to host 127.0.0.1 left intact

HTTPS

We are not able to modify HTTPS traffic to inject custom header once they start being encrypted. A X-Proxy-Info will be sent to client after HTTP/1.1 200 Connection established but not sure how clients can read it.

(env) bluet@ocisly:~/workspace/proxybroker2$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v https://httpbin.org/get
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
* allocate connect buffer!
* Establish HTTP proxy tunnel to httpbin.org:443
> CONNECT httpbin.org:443 HTTP/1.1
> Host: httpbin.org:443
> User-Agent: curl/7.58.0
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 Connection established
< X-Proxy-Info: 207.148.22.139:8080
<
* Proxy replied 200 to CONNECT request
* CONNECT phase completed!
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
...
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x5560b2e93580)
> GET /get HTTP/2
> Host: httpbin.org
> User-Agent: curl/7.58.0
> Accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 200
< date: Mon, 04 May 2020 03:39:35 GMT
< content-type: application/json
< content-length: 256
< server: gunicorn/19.9.0
< access-control-allow-origin: *
< access-control-allow-credentials: true
<
{
  "args": {},
  "headers": {
    "Accept": "*/*",
    "Host": "httpbin.org",
    "User-Agent": "curl/7.58.0",
    "X-Amzn-Trace-Id": "Root=1-5eaf8e77-efcb353b0983ad6a90f8bdcd"
  },
  "origin": "...",
  "url": "https://httpbin.org/get"
}
* Connection #0 to host 127.0.0.1 left intact

HTTP API

Get info of proxy been used for retrieving specific url

For HTTP, it's easy.

$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://proxycontrol/api/history/url:http://httpbin.org/get
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
> GET http://proxycontrol/api/history/url:http://httpbin.org/get HTTP/1.1
> Host: proxycontrol
> User-Agent: curl/7.58.0
> Accept: */*
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 34
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Credentials: true
<
{"proxy": "..."}

For HTTPS, we're not able to know encrypted payload (request), so only hostname can be used.

$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://proxycontrol/api/history/url:httpbin.org:443
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
> GET http://proxycontrol/api/history/url:httpbin.org:443 HTTP/1.1
> Host: proxycontrol
> User-Agent: curl/7.58.0
> Accept: */*
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 200 OK
< Content-Type: application/json
< Content-Length: 34
< Access-Control-Allow-Origin: *
< Access-Control-Allow-Credentials: true
<
{"proxy": "..."}
* Connection #0 to host 127.0.0.1 left intact

Remove specific proxy from queue

$ http_proxy=http://127.0.0.1:8888 https_proxy=http://127.0.0.1:8888 curl -v http://proxycontrol/api/remove/PROXY_IP:PROXY_PORT
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 8888 (#0)
> GET http://proxycontrol/api/remove/... HTTP/1.1
> Host: proxycontrol
> User-Agent: curl/7.58.0
> Accept: */*
> Proxy-Connection: Keep-Alive
>
< HTTP/1.1 204 No Content
<
* Connection #0 to host 127.0.0.1 left intact

Documentation

https://proxybroker.readthedocs.io/

TODO

  • Check the ping, response time and speed of data transfer
  • Check site access (Google, Twitter, etc) and even your own custom URL's
  • Information about uptime
  • Checksum of data returned
  • Support for proxy authentication
  • Finding outgoing IP for cascading proxy
  • The ability to specify the address of the proxy without port (try to connect on defaulted ports)

Contributing

  • Fork it: https://github.com/bluet/proxybroker2/fork
  • Create your feature branch: git checkout -b my-new-feature
  • Commit your changes: git commit -am 'Add some feature'
  • Push to the branch: git push origin my-new-feature
  • Submit a pull request!

License

Licensed under the Apache License, Version 2.0

This product includes GeoLite2 data created by MaxMind, available from http://www.maxmind.com.

Refs