lc/gau

Richer JSON output

ocervell opened this issue ยท 5 comments

Would be nice to have some other response data than just the URL in the JSON output, such as :

{
"url": "https://test.domain.synology.me/.htaccess-local",
"status_code": 200,
"words": 1066,
"lines": 100,
"content_length": 4516,
"content_type": "text/html; charset=utf-8",
"duration": 57779116,
"host": "test.domain.synology.me"
}

That would avoid scraping the endpoint again to find those details.

Maybe even consider using httpx as a client instead of fasthttp as it seems to give more info on the response ?

lc commented

gau is completely passive at the moment. It issues no HTTP requests to URLs that are archived from Wayback, OTX, etc. It can be piped into a tool such as httpx for additional info. Would you prefer that gau had an option for this instead?

Ah, I thought since there is a --mc strings # list of status codes to match option that there was still some crawling happening. What is the --mc flag purpose then ?
Otherwise an option for adding an httpx query could be done, even though we would not really control httpx input options like tech detection and so on ...

I think it is useful to add provider, timestamp, status_code, mimetype and content_length to the JSON output. In this case it would be possible to filter by this values on later stages.
I checked all providers and all of them return most of this fields.
I am ready to implement this change, if you agree.

lc commented

Hey @zerodivisi0n, I definitely agree

Great! Then I'll do it soon