/saxen

A tiny, super fast, namespace aware, sax-style XML parser.

Primary LanguageJavaScriptMIT LicenseMIT

/saxen/ parser

CI Codecov

A tiny, super fast, namespace aware sax-style XML parser written in plain JavaScript.

Features

  • (optional) entity decoding and attribute parsing
  • (optional) namespace aware
  • element / attribute normalization in namespaced mode
  • tiny (2.6Kb minified + gzipped)
  • pretty damn fast

Usage

import {
  Parser
} from 'saxen';

const parser = new Parser();

// enable namespace parsing: element prefixes will
// automatically adjusted to the ones configured here
// elements in other namespaces will still be processed
parser.ns({
  'http://foo': 'foo',
  'http://bar': 'bar'
});

parser.on('openTag', function(elementName, attrGetter, decodeEntities, selfClosing, getContext) {

  elementName;
  // with prefix, i.e. foo:blub

  const attrs = attrGetter();
  // { 'bar:aa': 'A', ... }
});

parser.parse('<blub xmlns="http://foo" xmlns:bar="http://bar" bar:aa="A" />');

Supported Hooks

We support the following parse hooks:

  • openTag(elementName, attrGetter, decodeEntities, selfClosing, contextGetter)
  • closeTag(elementName, decodeEntities, selfClosing, contextGetter)
  • error(err, contextGetter)
  • warn(warning, contextGetter)
  • text(value, decodeEntities, contextGetter)
  • cdata(value, contextGetter)
  • comment(value, decodeEntities, contextGetter)
  • attention(str, decodeEntities, contextGetter)
  • question(str, contextGetter)

In contrast to error, warn receives recoverable errors, such as malformed attributes.

In proxy mode, openTag and closeTag a view of the current element replaces the raw element name. In addition element attributes are not passed as a getter to openTag. Instead, they get exposed via the element.attrs:

  • openTag(element, decodeEntities, selfClosing, contextGetter)
  • closeTag(element, selfClosing, contextGetter)

Namespace Handling

In namespace mode, the parser will adjust tag and attribute namespace prefixes before passing the elements name to openTag or closeTag. To do that, you need to configure default prefixes for wellknown namespaces:

parser.ns({
  'http://foo': 'foo',
  'http://bar': 'bar'
});

To skip the adjustment and still process namespace information:

parser.ns();

Proxy Mode

In this mode, the first argument passed to openTag and closeTag is an object that exposes more internal XML parse state. This needs to be explicity enabled by instantiating the parser with { proxy: true }.

// instantiate parser with proxy=true
const parser = new Parser({ proxy: true });

parser.ns({
  'http://foo-ns': 'foo'
});

parser.on('openTag', function(el, decodeEntities, selfClosing, getContext) {
  el.originalName; // root
  el.name; // foo:root
  el.attrs; // { 'xmlns:foo': ..., id: '1' }
  el.ns; // { xmlns: 'foo', foo: 'foo', foo$uri: 'http://foo-ns' }
});

parser.parse('<root xmlns:foo="http://foo-ns" id="1" />')

Proxy mode comes with a performance penelty of roughly five percent.

Caution! For performance reasons the exposed element is a simple view into the current parser state. Because of that, it will change with the parser advancing and cannot be cached. If you would like to retain a persistent copy of the values, create a shallow clone:

parser.on('openTag', function(el) {
  const copy = Object.assign({}, el);
  // copy, ready to keep around
});

Non-Features

/saxen/ lacks some features known in other XML parsers such as sax-js:

  • no support for parsing loose documents, such as arbitrary HTML snippets
  • no support for text trimming
  • no automatic entity decoding
  • no automatic attribute parsing

...and that is ok ❤.

Credits

We build on the awesome work done by easysax.

/saxen/ is named after Sachsen, a federal state of Germany. So geht sächsisch!

LICENSE

MIT