wd5/WebCE
WEB Content Extractor (WEBCE) is an open source project that has two effective algorithms to eliminate uninformative blocks and efficiently extract content blocks from web pages. Moreover WEBCE produce a XML File that contains main, headline, and information about the article for a given web page.
C#
No issues in this repository yet.