/Readability4JAdvanced

Started as a Java (Kotlin) port of Mozilla's Readability https://github.com/mozilla/readability, see https://github.com/dankito/Readability4J. But has now additional features Readability doesn't have like keeping more images, loading <img> data-src attributes, ...

Primary LanguageHTML

_removeScripts() and _prepDocument() can be found in Preprocessor.prepareDocument().

_grabArticle() can be found in ArticleGrabber.grabArticle()

_postProcessContent() can be found in Postprocessor.postProcessContent()

_getArticleMetadata() is implemented in MetadataParser.getArticleMetadata()