/githhub_crawler

GitHub crawler to search in repositories, issues and wikis through a proxy.

Primary LanguagePython

GitHub Crawler

This crawler allows you to search in GitHub repositories, wikis or issues according to the keywords you pass to it and it returns a list of URLs of the found items.

It consists of a single endpoint created with FastAPI which handle input and carry out the crawling process.

Usage

To use this crawler, just make use of the Makefile to run the main commands:

Run server make up

Stop server make down

Run the tests make test

Endpoint documentation

To see the online documentation you can go here once the proyect is launched.

POST localhost:8000/crawler Body example

{
  "keywords": [
    "openstack",
    "nova",
    "css"
  ],
  "proxies": [
    "78.110.174.119:8080"
  ],
  "type": "Wikis"
}

keyword: List of keywords to use in the search.

proxies: List of proxies used to make the request to GitHub. One will be picked from the list randomly.

type: Specifies the type of entity where the search will be carried out. May take the following values: Repositories, Wikis or Issues.