Skip domains when running into errors
wurzelmuschel opened this issue · 5 comments
Hi,
I am using dnstwist as a python library in my application. For some domains that dnstwist generates, it runs into errors when it tries to collect additional information about a discovered domain (e.g. timeout errors or socket errors). When this happens, dnstwist (or more specifically libraries like url lib) throws an exception that is not handled internally by dnstwist. I can catch and handle it myself, but the current run is stopped and the results that were collected until then are gone. For certain domains it may take north of 40mins for a run, so rust restarting it is not the best option (especially when it dies again).
Would it be possible to handle these events internally and, skip the "faulty" domain that caused the error and continue with the next one?
Could you please share example traceback? Which version do you use?
Below you will find a traceback of a recent crash. This is from a system that uses 20230509 of dnstwist.
I call dnstwist.run() as follows, which also includes the domain name that caused the error below:
fakes = dnstwist.run(all=True, banners=True, format='null', mxcheck=True, domain='icig-bs.de', registered=True, phash=True, lsh='tlsh', whois=True)
Traceback (most recent call last):
File "/usr/local/lib/python3.11/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/usr/local/lib/python3.11/http/client.py", line 1283, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.11/http/client.py", line 1329, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.11/http/client.py", line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.11/http/client.py", line 1038, in _send_output
self.send(msg)
File "/usr/local/lib/python3.11/http/client.py", line 976, in send
self.connect()
File "/usr/local/lib/python3.11/http/client.py", line 942, in connect
self.sock = self._create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/socket.py", line 827, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno 8] Name does not resolve
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/home/checkdns/checkDNS/checkDNS2.py", line 250, in <module>
find_fake_domains(db)
File "/usr/home/checkdns/checkDNS/checkDNS2.py", line 83, in find_fake_domains
fakes = dnstwist.run(all=True,
^^^^^^^^^^^^^^^^^^^^^^
File "/home/checkdns/.virtualenvs/checkDNS/lib/python3.11/site-packages/dnstwist.py", line 945, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/checkdns/.virtualenvs/checkDNS/lib/python3.11/site-packages/dnstwist.py", line 1200, in run
r = UrlOpener(request_url,
^^^^^^^^^^^^^^^^^^^^^^
File "/home/checkdns/.virtualenvs/checkDNS/lib/python3.11/site-packages/dnstwist.py", line 204, in __init__
with urllib.request.urlopen(request, timeout=timeout, context=ctx) as r:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/request.py", line 519, in open
response = self._open(req, data)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/request.py", line 496, in _call_chain
result = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/request.py", line 1377, in http_open
return self.do_open(http.client.HTTPConnection, req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 8] Name does not resolve>
Process finished with exit code 1
This is intentional. You have chosen the options phash=True
and lsh='tlsh'
, which necessitate querying an HTTP server located behind the initial domain. If, for any reason, this process fails (such as in this case where the domain name cannot be resolved), an exception will be raised. This behavior mirrors that of the command line. Likewise, if you provide an invalid domain name, dnstwist.run()
will also raise an exception.
I don't think is has to do with the domain that is being given as the argument to the "run" function, but it seems to happen if one of the domains dnstwist creates is being checked. The problem is hardly reproducible. If I run the same script several times (even with the domain that does not have a website), it only sometimes drops out with the traceback I sent earlier, probably if it tries to check a domain it created that does not resolve (for whatever reason). If it would have to do with the "original" domain, the error should happen every time, right?
Coming back to my original question (whether dnstwist can handle errors internally by skipping a problematic domain), I just came across another exception that is not being handled internally. Again, it would be great if dnstwist would handle this by just ignoring/skipping the problematic domain:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/usr/local/lib/python3.11/http/client.py", line 1283, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.11/http/client.py", line 1329, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.11/http/client.py", line 1278, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.11/http/client.py", line 1038, in _send_output
self.send(msg)
File "/usr/local/lib/python3.11/http/client.py", line 976, in send
self.connect()
File "/usr/local/lib/python3.11/http/client.py", line 1448, in connect
super().connect()
File "/usr/local/lib/python3.11/http/client.py", line 942, in connect
self.sock = self._create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/socket.py", line 851, in create_connection
raise exceptions[0]
File "/usr/local/lib/python3.11/socket.py", line 836, in create_connection
sock.connect(sa)
TimeoutError: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/home/checkdns/checkDNS/checkDNS2.py", line 254, in <module>
find_fake_domains(db)
File "/usr/home/checkdns/checkDNS/checkDNS2.py", line 82, in find_fake_domains
fakes = dnstwist.run(all=True,
^^^^^^^^^^^^^^^^^^^^^^
File "/home/checkdns/.virtualenvs/checkDNS/lib/python3.11/site-packages/dnstwist.py", line 945, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/checkdns/.virtualenvs/checkDNS/lib/python3.11/site-packages/dnstwist.py", line 1200, in run
r = UrlOpener(request_url,
^^^^^^^^^^^^^^^^^^^^^^
File "/home/checkdns/.virtualenvs/checkDNS/lib/python3.11/site-packages/dnstwist.py", line 204, in __init__
with urllib.request.urlopen(request, timeout=timeout, context=ctx) as r:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/request.py", line 519, in open
response = self._open(req, data)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/request.py", line 496, in _call_chain
result = func(*args)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error timed out>
Process finished with exit code 1
It's still the same cause - can't query HTTP server behind the initial domain, but this time due to timeout.
File "/home/checkdns/.virtualenvs/checkDNS/lib/python3.11/site-packages/dnstwist.py", line 1200, in run
r = UrlOpener(request_url,
I could consider throwing custom exceptions, but still you would need to handle them.