/HTMLHarvesterPlusPlus

C++ header-only library for extracting information from HTML documents

Primary LanguageC++GNU General Public License v3.0GPL-3.0

HTMLHarvesterPlusPlus

C++ header-only library for extracting information from HTML documents

Todo

  • Extract page title
  • Extract all links
  • Cleanup extracted links (fix relative)
  • Filter extracted links (may not add, easy for user of lib to do)
  • Extract all page text content
  • Error handling :).