web crawler for given a URL, it outputs a simple textual sitemap.
The crawler has limited to one subdomain - so when you start with https://xhentai.com/about, it crawls all pages within example.com, but not follow external links, for example to nhentai.com or subdomain.example.com.
The list bellow represents feaures status for the project:
- Concurrent pages crawling, multiple crawlers run simultaneously
- Use workers pool to limit number of crawlers
- Arbitrary starting page
- Collecting absolute and relative links on a page
- Report the list of uniq URLs
- Signal handling
- In memory URL storage
- Unit testing
- Flexible and extendable applictaion design
- Initial build environment - one only needs docker and make util to build and test the project.
make tests
to run unit testsmake checks
to run lintersmake build
to build binaries under./bin/
direcory:- linux i386, amd64 and arm7
- windows 32 and 64 bits
- MacOS