New APIs to be added.

Question

New APIs to be added.

Edu4rdSHL opened this issue 5 years ago · 9 comments

Dear users, please put in comments APIs that you think should be added to findomain, it will help me a lot to improve the tool.

Note: what make findomain unique is that it only use APIs and doesn't do searchs in Google, etc. that's the secret why it's so faster. I haven't plans to add that to findomain then put only APIs (post or get) here still if they're not directly relationed with Certificate Transparency logs but can be used to discover subdomains.

Please, the following APIs are already implemented, make sure that the API that you want is not in that list:

Pull requests are more than welcome

Deleted user commented 5 years ago

ZeroDot1 commented 5 years ago

Answer 1 · 2019-09-03T06:47:15.000Z

BinaryEdge
AlienVault
WaybackMachine
CommonCrawl
PassiveTotal
ThreatCrowd
GoogleCT
Riddler
Censys
HackerTarget
ArchiveToday
ArchiveIt

Answer 2 · 2019-09-03T13:35:13.000Z

Hey @dtnml thanks a lot! Can you give reference links to the APIs documentations? That will make me easy to just read and start working in the implementations instead of looking for docs.

Answer 3 · 2019-09-10T01:44:43.000Z

Hi, Report URI just announced Certificate Transparency monitoring as well.
Might be worth taking a look and see how it can be added here.
https://scotthelme.co.uk/announcing-ct-monitoring-for-report-uri/

Answer 4 · 2019-11-08T20:12:53.000Z

Hi @Edu4rdSHL,

Add Support for: threatcrowd.org
https://www.threatcrowd.org/searchApi/v2/domain/report/?domain=google.com

Add Support for: certificatedetails.com
https://certificatedetails.com/api/list/google.com

Add support for: transparencyreport.google.com
https://transparencyreport.google.com/transparencyreport/api/v3/httpsreport/ct/certsearch/page?p=google.com
https://transparencyreport.google.com/transparencyreport/api/v3/httpsreport/ct/certsearch?include_expired=true&include_subdomains=true&domain=google.com
https://www.google.com/transparencyreport/api/v3/httpsreport/ct/certsearch?domain=google.com

Add Support for github.com
https://api.github.com/search/repositories?q=google.com
https://gist.github.com/search?utf8=%E2%9C%93&q=google.com

Add Support for: netcraft.com
https://searchdns.netcraft.com/?restriction=site+ends+with&host=google.com

class NetcraftEnum(enumratorBaseThreaded):
    def __init__(self, domain, subdomains=None, q=None, silent=False, verbose=True):
        subdomains = subdomains or []
        self.base_url = 'https://searchdns.netcraft.com/?restriction=site+ends+with&host={domain}'
        self.engine_name = "Netcraft"
        self.lock = threading.Lock()
        super(NetcraftEnum, self).__init__(self.base_url, self.engine_name, domain, subdomains, q=q, silent=silent, verbose=verbose)
        self.q = q
        return

    def req(self, url, cookies=None):
        cookies = cookies or {}
        try:
            resp = self.session.get(url, headers=self.headers, timeout=self.timeout, cookies=cookies)
        except Exception as e:
            self.print_(e)
            resp = None
        return resp

    def get_next(self, resp):
        link_regx = re.compile('<A href="(.*?)"><b>Next page</b></a>')
        link = link_regx.findall(resp)
        link = re.sub('host=.*?%s' % self.domain, 'host=%s' % self.domain, link[0])
        url = 'http://searchdns.netcraft.com' + link
        return url

    def create_cookies(self, cookie):
        cookies = dict()
        cookies_list = cookie[0:cookie.find(';')].split("=")
        cookies[cookies_list[0]] = cookies_list[1]
        # hashlib.sha1 requires utf-8 encoded str
        cookies['netcraft_js_verification_response'] = hashlib.sha1(urllib.unquote(cookies_list[1]).encode('utf-8')).hexdigest()
        return cookies

    def get_cookies(self, headers):
        if 'set-cookie' in headers:
            cookies = self.create_cookies(headers['set-cookie'])
        else:
            cookies = {}
        return cookies

    def enumerate(self):
        start_url = self.base_url.format(domain='example.com')
        resp = self.req(start_url)
        cookies = self.get_cookies(resp.headers)
        url = self.base_url.format(domain=self.domain)
        while True:
            resp = self.get_response(self.req(url, cookies))
            self.extract_domains(resp)
            if 'Next page' not in resp:
                return self.subdomains
                break
            url = self.get_next(resp)

    def extract_domains(self, resp):
        links_list = list()
        link_regx = re.compile('<a href="http://toolbar.netcraft.com/site_report\?url=(.*)">')
        try:
            links_list = link_regx.findall(resp)
            for link in links_list:
                subdomain = urlparse.urlparse(link).netloc
                if not subdomain.endswith(self.domain):
                    continue
                if subdomain and subdomain not in self.subdomains and subdomain != self.domain:
                    if self.verbose:
                        self.print_("%s%s: %s%s" % (R, self.engine_name, W, subdomain))
                    self.subdomains.append(subdomain.strip())
        except Exception:
            pass
        return links_list

Answer 5 · 2019-11-08T20:38:31.000Z

Hello, Threatcrowd is already implemented: https://github.com/Edu4rdSHL/findomain/blob/eda29344fe014f1a0034ededfc74b4daa941aa8e/src/lib.rs#L492-L500

certificatedetails.com doesn't provide a valid JSON.
transparencyreport.google.com doesn't provide valid JSONs (the first one provides a JSON but it says error).
Github and Netcraft also doesn't provide valid JSONs.

I will ONLY add APIs that reply with a proper JSON output like https://jonlu.ca/anubis/subdomains/google.com.

Answer 6 · 2019-11-09T16:58:56.000Z

I just looked at the WaybackMachine API, that should work with the following URLs...
http://web.archive.org/cdx/search/cdx?url=example.com*&output=txt
http://web.archive.org/cdx/search/cdx?url=example.com*&output=json
http://web.archive.org/cdx/search/cdx?url=example.com*&output=txt&limit=999999
https://github.com/internetarchive/wayback/tree/master/wayback-cdx-server#filtering

Answer 7 · 2019-12-16T03:50:56.000Z

Closing, new APIs requests should be in new issues for easy management.