/gowap

Wappalyzer implementation in Go

Primary LanguageGoApache License 2.0Apache-2.0

Gowap [Wappalyzer implementation in Go]

Build Status coverage report card

  • JS analysing (using Rod)
  • DNS scraping
  • Confidence rate
  • Recursive crawling
  • Rod browser integration (Colly can still be used - faster but not loading JS)
  • Can be used with as a cmd (technologies.json file embeded)
  • Test coverage 100%
  • robots.txt compliance

Usage

Using the package

go get github.com/unstppbl/gowap

Call Init() function with a Config object created with the NewConfig() function. It will return Wappalyzer object on which you can call Analyze method with URL string as argument.

    //Create a Config object and customize it
	config := gowap.NewConfig()
    //Path to override default technologies.json file
	config.AppsJSONPath = "path/to/my/technologies.json"
    //Timeout in seconds for fetching the url
	config.TimeoutSeconds = 5
    //Timeout in seconds for loading the page
	config.LoadingTimeoutSeconds = 5
    //Don't analyze page when depth superior to this number. Default (0) means no recursivity (only first page will be analyzed)
	config.MaxDepth = 2
    //Max number of pages to visit. Exit when reached
	config.MaxVisitedLinks = 10
    //Delay in ms between requests
	config.MsDelayBetweenRequests = 200
    //Choose scraper between rod (default) and colly
	config.Scraper = "colly"
    //Override the user-agent string
	config.UserAgent = "GoWap"
    //Output as a JSON string
    config.JSON = true

    //Initialisation
	wapp, err := gowap.Init(config)
    //Scraping 
    url := "https://scrapethissite.com/"
	res, err := wapp.Analyze(url)

Using the cmd

You can build the cmd using the commande : go build -o gowap cmd/gowap/main.go

Then using the compiled binary :

You must specify a url to analyse
Usage : gowap [options] <url>
  -delay int
    	Delay in ms between requests (default 100)
  -depth int
    	Don't analyze page when depth superior to this number. Default (0) means no recursivity (only first page will be analyzed)
  -file string
    	Path to override default technologies.json file
  -h	Help
  -loadtimeout int
    	Timeout in seconds for loading the page (default 3)
  -maxlinks int
    	Max number of pages to visit. Exit when reached (default 5)
  -pretty
    	Pretty print json output
  -scraper string
    	Choose scraper between rod (default) and colly (default "rod")
  -timeout int
    	Timeout in seconds for fetching the url (default 3)
  -useragent string
    	Override the user-agent string

To Do

List of some ideas :

  • analyse robots (field certIssuer)
  • analyse certificates (field certIssuer)
  • anayse css (field css)
  • anayse xhr requests (field xhr)
  • scrape an url list from a file in args
  • ability to choose what is scraped (DNS, cookies, HTML, scripts, etc...)
  • more tests in "real life"
  • perf ? regex html seems long
  • should output be the same as original wappalizer ? + ordering