nsonaniya2010/SubDomainizer

Increased parallelism

hiddengearz opened this issue · 2 comments

As you mentioned in #13 you'll be re-writing the code to support showing the URL's that secrets are found from. Is it possible to also increase the parallelism of this script?

When running this to scan 100~ url's it takes hours. I created a quick wrapper in golang to have an instance of the script run on all of my cpu's cores and it finished in 15 minutes so there are definitely some bottlenecks in the code slowing it down that could likely be threaded.

Thanks for making this amazing tool!

func SubDomainizer(dir string) {
	println("starting SubDomainizer")

	if _, err := os.Stat(dir + "/" + date + "/" + "subdomainizer/domains"); os.IsNotExist(err) {
		os.MkdirAll(dir+"/"+date+"/"+"subdomainizer/domains", os.ModePerm)
	}
	if _, err := os.Stat(dir + "/" + date + "/" + "subdomainizer/cloud"); os.IsNotExist(err) {
		os.MkdirAll(dir+"/"+date+"/"+"subdomainizer/cloud", os.ModePerm)
	}
	if _, err := os.Stat(dir + "/" + date + "/" + "subdomainizer/secrets"); os.IsNotExist(err) {
		os.MkdirAll(dir+"/"+date+"/"+"subdomainizer/secrets", os.ModePerm)
	}

	var wg = sync.WaitGroup{}
	maxGoroutines := 10
	guard := make(chan struct{}, maxGoroutines)

	domains := ReadFile(dir + "[redacted]")
	for _, domain := range domains {
		guard <- struct{}{}
		wg.Add(1)
		go func(dir string, date string, domain string) {
			hash := GenerateRandomString()

			cmd := exec.Command("python3", "[redacted]]tools/SubDomainizer/SubDomainizer.py", "-u", domain,
				"-o", dir+"/"+date+"/subdomainizer/domains/"+hash+"_domains.txt", "-cop", dir+"/"+date+"/subdomainizer/cloud/"+hash+"_cloud.txt", "-sop", dir+"/"+date+"/subdomainizer/secrets/"+hash+"_secrets.txt",
				"-g", "-gt", "[redacted]")

			println(cmd.String())
			cmd.Start()
			cmd.Wait()
			<-guard
			wg.Done()
		}(dir, date, domain)

	}
	wg.Wait()

}


Got it. I will add theading to the given list of URLs.

jokki commented

I imagine that this could be because of Python's GIL - Global Interpreter Lock - and how it denies multiple threads from running at once across multiple cores. You could switch to using processes instead which would mean some significant changes to the code. Or wrap the tool as exemplified above. Thanks for making this tool!