Mercury-XML provides a Mercury module for event-based parsing of XML streams.
The idea of parsing XML streams as events with Mercury is inspired by expat and SAX.
Mercury-XML handles these events: start tags, end tags and textual data.
Use predicate xml_read.fold_content/6
to fold over the events of an XML stream and accumulate them as arbitrary data structure.
To accumulate XML events in a list:
xml_read.fold_content(list.cons, Stream, [], Result, !IO)
Unlike expat and SAX, you don't register event handler functions. Every STag and ETag will be passed to the accumulator predicate. Textual data will only trigger an event if it is not whitespace-only. This behaviour is chosen to suppress whitespace that is only used to format/indent the XML.
If textual data is not whitespace-only, it will be passed to the accumulator predicate, ignoring leading whitespace.
You might accumulate XML events in a custom data structure directly or build a generic XML-DOM first and query it later.
A valid XML document can be parsed with this this code:
:- pred read_example_from_stream(io.input_stream::in,
maybe_error(list(content_event), xml_error)::out,
io::di, io::uo) is det.
read_example_from_stream(Stream, X, !IO) :-
get_xml_declaration(Stream, XmlDeclResult, !IO),
(
XmlDeclResult = ok,
xml_read.fold_content(list.cons, Stream, [], ContentResult, !IO),
(
ContentResult = ok(XmlEvents),
X = ok(list.reverse(XmlEvents))
;
ContentResult = error(_PartialRes, ContentError),
X = error(ContentError)
)
;
XmlDeclResult = error(XmlDeclError),
X = error(XmlDeclError)
).
When applied to an input stream from this sample XML document:
<?xml version="1.1" encoding="UTF-8"?>
<doc>
<Element Attribute="Value">Text</Element>
</doc>
it will produce this result:
ok([
elem_stag(elem_name("doc"), []),
elem_stag(elem_name("Element"), [
att_name("Attribute") - att_value("Value")
]),
data("Text"),
elem_etag(elem_name("Element")),
elem_etag(elem_name("doc"))
])
Parsing the XML stream in the example consists of these steps:
- Read the XML declaration with
get_xml_declaration/4
. - Fold over the XML events of the stream with
fold_content/6
and uselist.cons
to accumulate XML events in a list.
This example is part of the test cases. You can find it here to see it in action:
- The predicate in
tests/test_xml.m
- The XML input file in
tests/example.inp
and - The expected result in
tests/test_cases.m
.
Code is licensed under Mozilla Public License 2.0.
Trivial bits of code and test cases (not the testing framework) are dedicated to the public domain by using an explicit Public Domain dedication in the header of the respective files.
Add mercury-xml as a git submodule to your project:
$ git submodule add https://github.com/dzyr/mercury-xml.git
$ git commit -am 'add submodule mercury-xml'
This clones the submodule and registers it in your project.
After cloning your project, the mercury-xml folder will be empty. Type
$ git submodule update --init
to fetch all the data from the mercury-xml project and check out the appropriate commit listed in your superproject. You might make this part of an init
target in your Makefile (see below).
- Choose between two methods to access the
xml_read
module- The Mercury.modules method
- The library method
Generate a Mercury.modules
file to tell Mercury where to find the xml_read.m
source file. Simply add these targets to your src/Makefile
and type make
to compile your project.
.PHONY: default
default: Mercury.modules
mmc --make $(YOUR_PROJECTS_MAIN_MODULE)
Mercury.modules: $(wildcard *.m) $(wildcard ../mercury-xml/src/*.m)
mmc -f $(wildcard *.m) $(wildcard ../mercury-xml/src/*.m)
Add this to your Makefile to update the submodule by typing a simple:
$ make init
.PHONY: init
init:
git submodule update --init
Add this to your Makefile and type $ make init
:
.PHONY: init
init:
git submodule update --init
$(MAKE) install-mercury-xml
.PHONY: install-mercury-xml
install-mercury-xml:
cd mercury-xml && $(MAKE) default
cd mercury-xml && $(MAKE) install
The library will be installed in the mercury-xml/.sandbox folder by default. This allows you to install different versions of the library across different projects.
Don't forget to tell the Mercury compiler the place where the library is installed by writing this into your src/Mercury.options file:
MCFLAGS = --mld ../mercury-xml/.sandbox/lib/mercury
Run the regression test suite with:
$ make test
Time complexity of parsing an XML stream should be linear depending on the size of the XML file if you use a constant time accumulator predicate like list.cons.
Start benchmarking by:
$ cd tests
$ make benchmark
This writes a large XML test file (100 MB) to your hard disk and measures the time to parse it.
- Ignore comments in XML documents
- Add other options to parse textual data
- Handle encodings other than UTF-8
- Various other ToDo's and ideas: search for XXX in the source code
Dirk Ziegemeyer dirk@ziegemeyer.de