/proxy_py

Proxy collector

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

proxy_py README

proxy_py is a program which collects proxies, saves them in a database and makes periodically checks. It has a server for getting proxies with nice API(see below).

Where is the documentation?

It's here -> https://proxy-py.readthedocs.io

How to support this project?

You can donate here -> https://www.patreon.com/join/2313433

Thank you :)

How to install?

There is a prepared docker image.

1 Install docker and docker compose. If you're using ubuntu:

sudo apt install docker.io docker-compose

2 Download docker compose config:

wget "https://raw.githubusercontent.com/DevAlone/proxy_py/master/docker-compose.yml"

2 Create a container

docker-compose build

3 Run

docker-compose up

It will give you a server on address localhost:55555

To see running containers use

docker-compose ps

To stop proxy_py use

docker-compose stop

How to get proxies?

proxy_py has a server, based on aiohttp, which is listening 127.0.0.1:55555 (you can change it in the settings file) and provides proxies. To get proxies you should send the following json request on address http://127.0.0.1:55555/api/v1/ (or other domain if behind reverse proxy):

{
   "model": "proxy",
   "method": "get",
   "order_by": "response_time, uptime"
}

Note: order_by makes the result sorted by one or more fields(separated by comma). You can skip it. The required fields are model and method.

It's gonna return you the json response like this:

{
   "count": 1,
   "data": [{
      "address": "http://127.0.0.1:8080",
      "auth_data": "",
      "bad_proxy": false,
      "domain": "127.0.0.1",
      "last_check_time": 1509466165,
      "number_of_bad_checks": 0,
      "port": 8080,
      "protocol": "http",
      "response_time": 461691,
      "uptime": 1509460949
   }],
   "has_more": false,
   "status": "ok",
   "status_code": 200
}

Note: All fields except protocol, domain, port, auth_data, checking_period and address CAN be null

Or error if something went wrong:

{
   "error_message": "You should specify \"model\"",
   "status": "error",
   "status_code": 400
}

Note: status_code is also duplicated in HTTP status code

Example using curl:

curl -X POST http://127.0.0.1:55555/api/v1/ -H "Content-Type: application/json" --data '{"model": "proxy", "method": "get"}'

Example using httpie:

http POST http://127.0.0.1:55555/api/v1/ model=proxy method=get

Example using python's requests library:

import requests
import json


def get_proxies():
   result = []
   json_data = {
      "model": "proxy",
      "method": "get",
   }
   url = "http://127.0.0.1:55555/api/v1/"

   response = requests.post(url, json=json_data)
   if response.status_code == 200:
      response = json.loads(response.text)
      for proxy in response["data"]:
         result.append(proxy["address"])
   else:
      # check error here
      pass

   return result

Example using aiohttp library:

import aiohttp


async def get_proxies():
   result = []
   json_data = {
      "model": "proxy",
      "method": "get",
   }

   url = "http://127.0.0.1:55555/api/v1/"

   async with aiohttp.ClientSession() as session:
      async with session.post(url, json=json_data) as response:
         if response.status == 200:
            response = json.loads(await response.text())
            for proxy in response["data"]:
               result.append(proxy["address"])
         else:
            # check error here
            pass

   return result

How to interact with API?

Read more about API here -> https://proxy-py.readthedocs.io/en/latest/api_v1_overview.html

# TODO: add readme about API v2

What about WEB interface?

There is lib.ru inspired web interface which consists of these pages(with slash at the end):

How to contribute?

Just fork, make your changes(implement new collector, fix a bug or whatever you want) and create pull request.

Here are some useful guides:

How to test it?

If you've made changes to the code and want to check that you didn't break anything, just run

py.test

inside virtual environment in proxy_py project directory.

How to use custom checkers/collectors?

If you wan't to collect proxies from your source or you need proxies to work with particular site, you can write your own collectors or/and checkers.

  1. Create your checkers/collectors in current directory following the next directory structure:

// TOOD: add more detailed readme about it

local/
├── requirements.txt
├── checkers
│   └── custom_checker.py
└── collectors
    └── custom_collector.py

You can create only checker or collector if you want so

  1. Create proxy_py/settings.py in current dir with the following content
from ._settings import *
from local.checkers.custom_checker import CustomChecker

PROXY_CHECKERS = [CustomChecker]

COLLECTORS_DIRS = ['local/collectors']

you can append your checker to PROXY_CHECKERS or COLLECTORS_DIRS instead of overriding to use built in ones as well, it's just normal python file. See proxy_py/_settings.py for more detailed instructions on options.

  1. Follow the steps in "How to install?" but download this docker-compose config instead
wget "https://raw.githubusercontent.com/DevAlone/proxy_py/master/docker-compose-with-local.yml"

and run with command

docker-compose -f docker-compose-with-local.yml up
  1. ...?
  2. Profit!

How to build from scratch?

  1. Clone this repository
git clone https://github.com/DevAlone/proxy_py.git
  1. Install requirements
cd proxy_py
pip3 install -r requirements.txt
  1. Create settings file
cp config_examples/settings.py proxy_py/settings.py
  1. Install postgresql and change database configuration in settings.py file
  2. (Optional) Configure alembic
  3. Run your application
python3 main.py
  1. Enjoy!

Mirrors