Bug in parsing results from crt.sh?
geeknik opened this issue · 1 comments
geeknik commented
Expected Behavior
Domains returned as expected.
Actual Behavior
[+] Searching on Threatcrowd...
[+] Searching on Hackertarget...
[+] Searching on UrlScan...
[+] Searching on DnsDumpster...
[+] Searching on crt.sh...
Traceback (most recent call last):
File "/usr/local/bin/pdlist", line 11, in <module>
load_entry_point('pdlist==0.1.0', 'console_scripts', 'pdlist')()
File "/usr/local/lib/python3.7/dist-packages/pdlist-0.1.0-py3.7.egg/pdlist/main.py", line 85, in main
subdomains += source.parse(domains)
File "/usr/local/lib/python3.7/dist-packages/pdlist-0.1.0-py3.7.egg/pdlist/source/crtsh.py", line 31, in parse
json_resp = json.loads(requests.get(url).text)
File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Steps to Reproduce the Problem
- run pdlist against a domain that will return A LOT of results, like google.com.
pdlist google.com -o /tmp/google.com.txt
Specifications
- Version: Python 3.7.3 (GCC 8.3.0)
- Platform: Ubuntu 19.04
- pdlist commit e95a7d8
gnebbia commented
Ok, I added an exception handler for that, seems that crtsh is giving me back a malformed json in some way, json.loads complains about that.
At the moment we just skip crt.sh for any errors, in the future I could implement a downloader for that json and try to fix the format.