/go-readability

Go package that cleans a HTML page for better readability.

Primary LanguageGoMIT LicenseMIT

Go-Readability

GoDoc

Go-Readability is a Go package that cleans a HTML page from clutter like buttons, ads and background images, and changes the page's text size, contrast and layout for better readability.

This package is fork from readability by ying32, which inspired by readability for node.js and readability for python. I also add some function from the readibility by Mozilla.

Why fork ?

There are severals reasons as to why I create a new fork instead sending a PR to original repository :

  • It seems GitHub is hard to access from China, that's why ying32 is not really active on his repository.
  • Most of comment and documentation in original repository is in Chinese language, which unfortunately I still not able to understand.

Example

package main

import (
	"fmt"
	"github.com/RadhiFadlillah/go-readability"
	"time"
)

func main() {
	url := "https://www.nytimes.com/2018/01/21/technology/inside-amazon-go-a-store-of-the-future.html"

	article, err := readability.Parse(url, 5*time.Second)
	if err != nil {
		panic(err)
	}

	fmt.Println(article.Meta.Title)
	fmt.Println(article.Meta.Excerpt)
	fmt.Println(article.Meta.Author)
	fmt.Println(article.Content)
}