lc/gau

Regarding feat: implement unthrottled concurrency using task queue

wumpus opened this issue · 8 comments

Can you stop attacking the Common Crawl CDX API?

lc commented

I’m not? This is an open source tool to find archived URLs for a given domain…

Yes, and because it isn't throttled, use of this package harms the target, which is me.

Any progress? I was hoping for rate limiting, honoring 503 and 429 status codes, and exponential backoff.

And not just "unthrottled concurrency".

lc commented

It’s open source, so PR's are welcome.

It is going to be a busy month with some life changes for me – I will put this in my TODO's. Unfortunately will likely not get done until late June or early July

lc commented

Accidentally closed when commenting

Thanks for adding to your TODO list, I appreciate it!

Here's an example of making a single query in Athena that's much more efficient than gau: https://positive.security/blog/ransack-data-exfiltration#common-crawl

lc commented

Thanks for the reference & sorry about the slowness to implement. Getting hitched!

Congratulations!