/QGumboParser

C++ wrapper for gumbo-parser library

Primary LanguageC++MIT LicenseMIT

Introduction

If you need to parse HTML page in Qt application it can be a problem. Qt doesn't have a HTML parser. You can use gumbo-parser developed by google but it was written in pure C and doesn't provide Qt-like interface. Therefore it is not so comfortable to work with. This small library solves the issue.

Quick Start

The easiest way to use QGumboParser is add it to your project as git submodule.
To add the library use the following steps:

  • Create Subdirs Project.
  • Add application subproject. "Qt Console Application" for example
  • Open project folder and create libs directory
  • Run git submodule add git@github.com:lagner/QGumboParser.git libs/QGumboParser in terminal.
  • Run git submodule update --init --recursive
  • Add SUBDIRS += libs/QGumboParser/QGumboParser into the root project. QGumboParser have to appear in your project tree
  • Right click to application project that need Html parser and hit Add library -> Internal library -> select QGumboParser in combobox. Click finish.

The library is ready to use.
Please pay attention to the library requires c++11 support(just add "CONFIG += c++11" into your .pro file).

Example

#include <QCoreApplication>
#include <QDebug>
#include <qgumbodocument.h>
#include <qgumbonode.h>


const char* HTML_PAGE = R"~("
<!DOCTYPE html>
<html>
  <head>
    <title>Title text</title>
    <meta content="">
    <style></style>
  </head>
  <body>
    <h3>First header</h3>
    <p>text text text</p>
    <div class="content">
        <h3>Nested header <a href="">with link</a></h3>
    </div>
  </body>
</html>
")~";

int main()
{
    auto doc = QGumboDocument::parse(HTML_PAGE);
    auto root = doc.rootNode();
    auto nodes = root.getElementsByTagName(HtmlTag::TITLE);
    Q_ASSERT(nodes.size() == 1);

    auto title = nodes.front();
    qDebug() << "title is: " << title.innerText();

    nodes = root.getElementsByTagName(HtmlTag::H3);
    for (const auto& node: nodes) {
        qDebug() << "h3: " << node.innerText();
    }

    auto container = root.getElementsByClassName("content");
    Q_ASSERT(container.size() == 1);

    auto children = container.front().children();
    for (const auto& node: children) {
        qDebug() << "Tag: " << node.tagName();
    }

    return 0;
}

License

MIT License. See LICENSE file