
An simpe, fast, extensible crawl page framework for golang

Primary LanguageGo


Scrago is an simpe, fast, extensible crawl page framework for golang.


 go get github.com/foolin/scrago




Step 1:

type ExampModel struct {
	Title string `scrago:"title"`
	Name string `scrago:"#main>.intro>h2::text()"`
	Description string `scrago:"#main>.intro>p::html()"`
	Intro string  `scrago:"#main>.intro::outerHtml()"`
	Keywords []string  `scrago:"#main .keywords::GetMyKeywords()"`

func (e *ExampModel) GetMyKeywords(s *goquery.Selection) ([]string, error) {
	v := s.Text()
	if v == ""{
		return nil, fmt.Errorf("not found keywords!")
	arr := strings.Split(v, ",")
	for i := 0; i < len(arr); i++{
		arr[i] = strings.TrimSpace(arr[i])
	return arr, nil

Step 2:

func main()  {
	examp := ExampModel{}
	s := scrago.New()
	err := s.HttpGetParser("https://raw.githubusercontent.com/foolin/scrago/master/example/data/example.html", &examp)
	if err != nil {

func printjson(v interface{})  {
	enc := json.NewEncoder(os.Stdout)
	enc.SetIndent("", "    ")

Step 3:

Execute result:

    "Title": "Scrago exmaples",
    "Name": "Scrago framework",
    "Description": "An open source and collaborative framework for extracting the data you need from websites.\n            In a <b>fast</b>, <b>simple</b>, yet extensible way.",
    "Intro": "<div class=\"intro\">\n        <h2>Scrago framework</h2>\n        <p>An open source and collaborative framework for extracting the data you need from websites.\n            In a <b>fast</b>, <b>simple</b>, yet extensible way.</p>\n        <div class=\"keywords\">Scrago, Scrap, Spider, Crawl, GoLang, Simple, Easy</div>\n    </div>",
    "Keywords": [

Origin page:

<!doctype html>
<html class="no-js" lang="">

    <meta charset="utf-8">
    <title>Scrago exmaples</title>

<div id="header">
    <div class="container">
        <div class="clearfix">
            <div class="logo">
                <a href="https://github.com/foolin/scrago" title="Scrago exmaple">
                    <h1 title="Scrago exmaple - crawl framework for go">Scrago exmaple</h1>

<div class="navlink">
    <div class="container">
        <ul class="clearfix">
            <li ><a href="/">Index</a></li>
            <li ><a href="/list/web" title="web site">Web page</a></li>
            <li ><a href="/list/pc" title="pc page">Pc Page</a></li>
            <li ><a href="/list/mobile" title="mobile page">Mobile Page</a></li>

<div id="main">
    <div class="intro">
        <h2>Scrago framework</h2>
        <p>An open source and collaborative framework for extracting the data you need from websites.
            In a <b>fast</b>, <b>simple</b>, yet extensible way.</p>
        <div class="keywords">Scrago, Scrap, Spider, Crawl, GoLang, Simple, Easy</div>
    <div class="typelist">
            <li data-type="bool">true</li>
            <li data-type="int">123</li>
            <li data-type="float">45.6</li>
            <li data-type="string">hello</li>
            <li data-type="array">



Struct tag

Between selector and function use "::" symbol segmentation

  • selector: Css selector, sea more:github.com/PuerkitoBio/goquery

  • function: Get data function,default is text()。

    1.Inner function:

    • text() get text value.
    • html() get html vlaue.
    • outerHtml() get outer html value.
    • attr(xxx) get attribute value, eg:attr(href)。

    2.Write custom function:

func (e *ExampModel) MyFunc(s *goquery.Selection) (MyReturnType, error) {
    return ReturnValue, nil


type ExampModel struct {
    TextField string `scrago:"#xxx"`
    TextField2 string `scrago:".xxx::text()"`
    Link string `scrago:"a::attr(href)"`
    MyField string  `scrago:"#xxx::MyFunc()"`

func (e *ExampModel) MyFunc(s *goquery.Selection) (String, error) {
    return s.Text(), nil



  • github.com/PuerkitoBio/goquery