gnebbia/pdlist

Bug in parsing results from crt.sh?

geeknik opened this issue · 1 comments

Expected Behavior

Domains returned as expected.

Actual Behavior

[+]  Searching on Threatcrowd...
[+]  Searching on Hackertarget...
[+]  Searching on UrlScan...
[+]  Searching on DnsDumpster...
[+]  Searching on crt.sh...
Traceback (most recent call last):
  File "/usr/local/bin/pdlist", line 11, in <module>
    load_entry_point('pdlist==0.1.0', 'console_scripts', 'pdlist')()
  File "/usr/local/lib/python3.7/dist-packages/pdlist-0.1.0-py3.7.egg/pdlist/main.py", line 85, in main
    subdomains += source.parse(domains)
  File "/usr/local/lib/python3.7/dist-packages/pdlist-0.1.0-py3.7.egg/pdlist/source/crtsh.py", line 31, in parse
    json_resp = json.loads(requests.get(url).text)
  File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Steps to Reproduce the Problem

  1. run pdlist against a domain that will return A LOT of results, like google.com.
    pdlist google.com -o /tmp/google.com.txt

Specifications

  • Version: Python 3.7.3 (GCC 8.3.0)
  • Platform: Ubuntu 19.04
  • pdlist commit e95a7d8

Ok, I added an exception handler for that, seems that crtsh is giving me back a malformed json in some way, json.loads complains about that.
At the moment we just skip crt.sh for any errors, in the future I could implement a downloader for that json and try to fix the format.