Mostly just to prove to myself this can be done, I've written a basic crawler in C++11. You will need libxml2 installed to use it, but HTML parsing is actually performed using Google's gumbo parser (see the deps folder).
This is not ready for prime-time.