
A html dom query tool that works like jquery in browser. HTML DOM 命令行操作工具

hq - html query tool, with jQuery notation on dom operation


This is a tool that works like jQuery notation in browser. 一个按jQuery风格的,工作在命令行下的 html dom 筛选器工具


  • 20160626: auto detect the file encoding by <meta http-equiv=...> or <meta charset=...> tag, then re-encode the output to utf8, if options [-noenc] is not specified.
  • 20160626: open the -u STRING first as file, if failed , as url. and if not prefix with http, add http:// before request


Usage of ./hq:
  -attr string
        print the attribute <string> in node, <string> are comma seperated, and output is joined with tab. eg: -attr href,target
  -d    debug or not, if debug, some more will be output
        print the innerHTML of the node
        DO NOT care about the output encoding. without this option, we try to detect and encode the output to utf8
        print the outerHTML of the node
        print the TEXT part of the node. same as <-attr 'text'>
  -u string
        URI or FilePath to scrape. default STDIN, so we can pipe sth :). URL must start with 'http' (default "-")

Example usage: ./hq [options] <-html|-ohtml|-text|-attr <name1,name2,...> > <selector>
    selector: jQuery style selector. eg: "head script"
    -html|-ohtml|-text|-attr: must specify at least one of these functions

    When u want to print multiple field that combined with text part and attribute, such as href and textbody,  you can <-attr 'href, text'>.


./hq -u 'http://www.qq.com' -attr 'href,text' 'div#newsInfoQuanguo a[target="_blank"]'  

** Note **

  • The selector chose the 'XXX' in <div id="newsInfoQuanguo">...<a target="_blank">xx</a>...</div>. Learn more about the jQuery selector (here http://www.w3school.com.cn/jquery/jquery_selectors.asp)
  • Take care about the encoding, u should make sure that it fits your own env. we consider that you works in utf8 envirionment

It will produce a list like this below . the original html was stored at index.html (on 26 Jun 2016), just try it

