/SwiftSoup

SwiftSoup: Swift HTML Parser, with best of DOM, CSS, and jquery

Primary LanguageSwiftMIT LicenseMIT

SwiftSoup

Build Status Version License Platform

SwiftSoup is a Swift library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods. SwiftSoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do.

  • scrape and parse HTML from a URL, file, or string
  • find and extract data, using DOM traversal or CSS selectors
  • manipulate the HTML elements, attributes, and text
  • clean user-submitted content against a safe white-list, to prevent XSS attacks
  • output tidy HTML SwiftSoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; SwiftSoup will create a sensible parse tree.

Installation

SwiftSoup is available through CocoaPods. To install it, simply add the following line to your Podfile:

pod "SwiftSoup"

Exemple

To parse a HTML document:

let html = "<html><head><title>First parse</title></head>"
			+ "<body><p>Parsed HTML into a doc.</p></body></html>"
		let doc: Document = try SwiftSoup.parse(html)
		return try doc.text()
  • unclosed tags (e.g. <p>Lorem <p>Ipsum parses to <p>Lorem</p> <p>Ipsum</p>)
  • implicit tags (e.g. a naked <td>Table data</td> is wrapped into a <table><tr><td>...)
  • reliably creating the document structure (html containing a head and body, and only appropriate elements within the head)

###The object model of a document

  • Documents consist of Elements and TextNodes
  • The inheritance chain is: Document extends Element extends Node.TextNode extends Node.
  • An Element contains a list of children Nodes, and has one parent Element. They also have provide a filtered list of child Elements only.

Author

Nabil Chatbi, scinfu@gmail.com

Note

SwiftSoup was ported to Swift from Java Jsoup library.

License

SwiftSoup is available under the MIT license. See the LICENSE file for more info.