/hakrawler

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application

Primary LanguageGoMIT LicenseMIT

hakrawler

Twitter Version License

What is it?

hakrawler is a Go web crawler designed for easy, quick discovery of endpoints and assets within a web application. It can be used to discover:

  • Forms
  • Endpoints
  • Subdomains
  • Related domains
  • JavaScript files

The goal is to create the tool in a way that it can be easily chained with other tools such as subdomain enumeration tools and vulnerability scanners in order to facilitate tool chaining, for example:

assetfinder target.com | hakrawler | some-xss-scanner

Features

  • Unlimited, fast web crawling for endpoint discovery
  • Fuzzy matching for domain discovery
  • robots.txt parsing
  • sitemap.xml parsing
  • Plain output for easy parsing into other tools
  • Accept domains from stdin for easier tool chaining
  • SQLMap-friendly output format
  • Link gathering from JavaScript files

Upcoming features

Contributors

  • hakluke wrote the tool
  • cablej cleaned up the code
  • Corben Leo added in functionality to pull links from JavaScript files
  • delic made the code much cleaner
  • hoenn made the code even cleanerer
  • ameenmaali made a bunch of code improvements and bug fixes
  • daehee added the -nocolor flag

Thanks

  • codingo and prodigysml/sml555, my favourite people to hack with. A constant source of ideas and inspiration. They also provided beta testing and a sounding board for this tool in development.
  • tomnomnom who wrote waybackurls, which powers the wayback part of this tool
  • s0md3v who wrote photon, which I took ideas from to create this tool
  • The folks from gocolly, the library which powers the crawler engine
  • oxffaa, who wrote a very efficient sitemap.xml parser which is used in this tool
  • The contributors of LinkFinder where some awesome regex was stolen to parse links from JavaScript files.

Installation

  1. Install Golang
  2. Run the command below
go get github.com/hakluke/hakrawler
  1. Run hakrawler from your Go bin directory. For linux systems it will likely be:
~/go/bin/hakrawler

Note that if you need to do this, you probably want to add your Go bin directory to your $PATH to make things easier!

Usage

Note: multiple domains can be crawled by piping them into hakrawler from stdin. If only a single domain is being crawled, it can be added by using the -domain flag.

$ hakrawler -h
Usage of hakrawler:
  -all
    	Include everything in output - this is the default, so this option is superfluous (default true)
  -auth string
    	The value of this will be included as a Authorization header
  -cookie string
    	The value of this will be included as a Cookie header
  -depth int
    	Maximum depth to crawl, the default is 1. Anything above 1 will include URLs from robots, sitemap, waybackurls and the initial crawler as a seed. Higher numbers take longer but yield more results. (default 1)
  -forms
    	Include form actions in output
  -js
    	Include links to utilised JavaScript files
  -linkfinder
    	Run linkfinder on javascript files.
  -outdir string
    	Directory to save discovered raw HTTP requests
  -plain
    	Don't use colours or print the banners to allow for easier parsing
  -robots
    	Include robots.txt entries in output
  -scope string
    	Scope to include:
    	strict = specified domain only
    	subs = specified domain and subdomains
    	yolo = everything (default "subs")
  -sitemap
    	Include sitemap.xml entries in output
  -subs
    	Include subdomains in output
  -url string
    	The url that you wish to crawl, e.g. google.com or https://example.com. Schema defaults to http
  -urls
    	Include URLs in output
  -usewayback
    	Query wayback machine for URLs and add them as seeds for the crawler
  -v	Display version and exit
  -wayback
    	Include wayback machine entries in output

Basic Example

Command: hakrawler -url bugcrowd.com -depth 1

sample output