/swift-html-entities

HTML5 spec-compliant character encoder/decoder for Swift

Primary LanguageSwiftApache License 2.0Apache-2.0

HTMLEntities

Build Status - Master macOS Linux Apache 2 codecov Carthage compatible

Summary

Pure Swift HTML encode/decode utility tool for Swift.

Includes support for HTML5 named character references. You can find the list of all 2231 HTML5 named character references here.

HTMLEntities can escape ALL non-ASCII characters as well as the characters <, >, &, ", , as these five characters are part of the HTML tag and HTML attribute syntaxes.

In addition, HTMLEntities can unescape encoded HTML text that contains decimal, hexadecimal, or HTML5 named character references.

API Documentation

API documentation for HTMLEntities is located here.

Features

  • Supports HTML5 named character references (NegativeMediumSpace; etc.)
  • HTML5 spec-compliant; strict parse mode recognizes parse errors
  • Supports decimal and hexadecimal escapes for all characters
  • Simple to use as functions are added by way of extending the default String class
  • Minimal dependencies; implementation is completely self-contained

Version Info

Latest release of HTMLEntities requires Swift 4.0 and higher.

Installation

Via Swift Package Manager

Add HTMLEntities to your Package.swift:

import PackageDescription

let package = Package(
  name: "<package-name>",
  ...
  dependencies: [
    .package(url: "https://github.com/Kitura/swift-html-entities.git", from: "3.0.0")
  ]
  // Also, make sure to add HTMLEntities to your package target's dependencies
)

Via CocoaPods

Add HTMLEntities to your Podfile:

target '<project-name>' do
  pod 'HTMLEntities', :git => 'https://github.com/Kitura/swift-html-entities.git'
end

Via Carthage

Add HTMLEntities to your Cartfile:

github "Kitura/swift-html-entities"

Usage

import HTMLEntities

// encode example
let html = "<script>alert(\"abc\")</script>"

print(html.htmlEscape())
// Prints "&#x3C;script&#x3E;alert(&#x22;abc&#x22;)&#x3C;/script&#x3E;"

// decode example
let htmlencoded = "&lt;script&gt;alert(&quot;abc&quot;)&lt;/script&gt;"

print(htmlencoded.htmlUnescape())
// Prints "<script>alert(\"abc\")</script>"

Advanced Options

HTMLEntities supports various options when escaping and unescaping HTML characters.

Escape Options

allowUnsafeSymbols

Defaults to false. Specifies if unsafe ASCII characters should be skipped or not.

import HTMLEntities

let html = "<p>\"café\"</p>"

print(html.htmlEscape())
// Prints "&#x3C;p&#x3E;&#x22;caf&#xE9;&#x22;&#x3C;/p&#x3E;"

print(html.htmlEscape(allowUnsafeSymbols: true))
// Prints "<p>\"caf&#xE9;\"</p>"

decimal

Defaults to false. Specifies if decimal character escapes should be used instead of hexadecimal character escapes whenever numeric character escape is used (i.e., does not affect named character references escapes). The use of hexadecimal character escapes is recommended.

import HTMLEntities

let text = "한, 한, ế, ế, 🇺🇸"

print(text.htmlEscape())
// Prints "&#x1112;&#x1161;&#x11AB;, &#xD55C;, &#x1EBF;, e&#x302;&#x301;, &#x1F1FA;&#x1F1F8;"

print(text.htmlEscape(decimal: true))
// Prints "&#4370;&#4449;&#4523;, &#54620;, &#7871;, e&#770;&#769;, &#127482;&#127480;"

encodeEverything

Defaults to false. Specifies if all characters should be escaped, even if some characters are safe. If true, overrides the setting for allowUnsafeSymbols.

import HTMLEntities

let text = "A quick brown fox jumps over the lazy dog"

print(text.htmlEscape())
// Prints "A quick brown fox jumps over the lazy dog"

print(text.htmlEscape(encodeEverything: true))
// Prints "&#x41;&#x20;&#x71;&#x75;&#x69;&#x63;&#x6B;&#x20;&#x62;&#x72;&#x6F;&#x77;&#x6E;&#x20;&#x66;&#x6F;&#x78;&#x20;&#x6A;&#x75;&#x6D;&#x70;&#x73;&#x20;&#x6F;&#x76;&#x65;&#x72;&#x20;&#x74;&#x68;&#x65;&#x20;&#x6C;&#x61;&#x7A;&#x79;&#x20;&#x64;&#x6F;&#x67;"

// `encodeEverything` overrides `allowUnsafeSymbols`
print(text.htmlEscape(allowUnsafeSymbols: true, encodeEverything: true))
// Prints "&#x41;&#x20;&#x71;&#x75;&#x69;&#x63;&#x6B;&#x20;&#x62;&#x72;&#x6F;&#x77;&#x6E;&#x20;&#x66;&#x6F;&#x78;&#x20;&#x6A;&#x75;&#x6D;&#x70;&#x73;&#x20;&#x6F;&#x76;&#x65;&#x72;&#x20;&#x74;&#x68;&#x65;&#x20;&#x6C;&#x61;&#x7A;&#x79;&#x20;&#x64;&#x6F;&#x67;"

useNamedReferences

Defaults to false. Specifies if named character references should be used whenever possible. Set to false to always use numeric character references, i.e., for compatibility with older browsers that do not recognize named character references.

import HTMLEntities

let html = "<script>alert(\"abc\")</script>"

print(html.htmlEscape())
// Prints “&#x3C;script&#x3E;alert(&#x22;abc&#x22;)&#x3C;/script&#x3E;”

print(html.htmlEscape(useNamedReferences: true))
// Prints “&lt;script&gt;alert(&quot;abc&quot;)&lt;/script&gt;”

Set Escape Options Globally

HTML escape options can be set globally so that you don't have to set them everytime you want to escape a string. The options are managed in the String.HTMLEscapeOptions struct.

import HTMLEntities

// set `useNamedReferences` to `true` globally
String.HTMLEscapeOptions.useNamedReferences = true

let html = "<script>alert(\"abc\")</script>"

// Now, the default behavior of `htmlEscape()` is to use named character references
print(html.htmlEscape())
// Prints “&lt;script&gt;alert(&quot;abc&quot;)&lt;/script&gt;”

// And you can still go back to using numeric character references only
print(html.htmlEscape(useNamedReferences: false))
// Prints "&#x3C;script&#x3E;alert(&#x22;abc&#x22;)&#x3C;/script&#x3E;"

Unescape Options

strict

Defaults to false. Specifies if HTML5 parse errors should be thrown or simply passed over.

Note: htmlUnescape() is a throwing function if strict is used in call argument (no matter if it is set to true or false); htmlUnescape() is NOT a throwing function if no argument is provided.

import HTMLEntities

let text = "&#4370&#4449&#4523"

print(text.htmlUnescape())
// Prints "한"

print(try text.htmlUnescape(strict: true))
// Throws a `ParseError.MissingSemicolon` instance

// a throwing function because `strict` is passed in argument
// but no error is thrown because `strict: false`
print(try text.htmlUnescape(strict: false))
// Prints "한"

Acknowledgments

HTMLEntities was designed to support some of the same options as he, a popular Javascript HTML encoder/decoder.

License

Apache 2.0