Table of Contents
- Mycila XML Tool
XMLTool is a very simple Java library to be able to do all sorts of common operations with an XML document. As a Java developer, I often end up writing the always the same code for processing XML, transforming, ... So i decided to put all in a very easy to use class using the Fluent Interface pattern to facilitate XML manipulations.
XMLTag tag = XMLDoc.newDocument(false)
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addNamespace("wicket", "http://wicket.sourceforge.net/wicket-1.0")
.addRoot("html")
.addTag("wicket:border")
.gotoRoot().addTag("head")
.addNamespace("other", "http://other-ns.com")
.gotoRoot().addTag("other:foo");
System.out.println(tag.toString());
Features
With XML Tool you will be able to quickly:
- Create new XML documents from external sources or new document from scrash
- Manage namespaces
- Manipulating nodes (add, remove, rename)
- Manipulating data (add, remove text or CDATA)
- Navigate into the document with shortcuts and XPath (note: XPath supports namespaces)
- Tranform an XMlDoc instance to a String or a Document
- Validate your document against schemas
- Executin callbacks on a hierarchy
- Remove all namspaces (namespace ignoring)
- ... and a lot of other features !
Project status
- Issues: https://github.com/mathieucarbou/xmltool/issues
- OSGI Compliant:
Releases
Available in Maven Central Repository: http://repo1.maven.org/maven2/com/mycila/mycila-xmltool/
Snapshots
Available in OSS Repository: https://oss.sonatype.org/content/repositories/snapshots/com/mycila/mycila-xmltool/
Maven dependency
<dependency>
<groupId>com.mycila</groupId>
<artifactId>mycila-xmltool</artifactId>
<version>X.Y.ga</version>
</dependency>
Maven sites
XML Tool uses the Java DOM API and Document
creation has a cost. Thus, to improve peformance, XML Tool uses 2 Object pools of DocumentBuilder
instances:
- one pool for namespace-aware document builders
- another one ignoring namespaces
You can configure the pools by using XMLDocumentBuilderFactory.setPoolConfig(config)
By default, each of the 2 pools have the following configuration:
- min idle = 0
- max idle = CPU core number
- max total = CPU core number * 4
- max wait time = -1
If your application is heavily threaded and a lot of threads are using XMLTag concurrently, to avoid thread contention you might want to increase the max total to match your peak thread count and max idle to match your average thread count.
If your application does not use a lot of thread and often create documents, you could probably lower those numbers.
The goal is to have sufficient DocumentBuilder
instances available in the pool to be able to "feed" your application as demand without waiting for these objects to become available.
Using an object pool is sure much more complicated, but it will prevent any threading issues and also maximize performance because of object reuse.
The newDocument
method crate a new XML document. You then have to choose a default namespace if you want and then choose the root name of the document.
System.out.println(XMLDoc.newDocument(true).addRoot("html").toString());
gives:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html/>
The from
methods can load an XML document from any of the following types:
- org.w3c.dom.Node
- InputSource
- Reader
- InputStream
- File
- URL
- String
- javax.xml.transform.Source
Example:
URL yahooGeoCode = new URL("http://local.yahooapis.com/MapsService/V1/geocode?appid=YD-9G7bey8_JXxQP6rxl.fBFGgCdNjoDMACQA--&state=QC&country=CA&zip=H1W3B8");
System.out.println(XMLDoc.from(yahooGeoCode, true).toString());
System.out.println(XMLDoc.from(yahooGeoCode, true).getText("Result/City"));
outputs:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ResultSet xmlns="urn:yahoo:maps" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:yahoo:maps http://api.local.yahoo.com/MapsService/V1/GeocodeResponse.xsd">
<Result precision="zip">
<Latitude>45.543289</Latitude>
<Longitude>-73.543098</Longitude>
<Address/>
<City>Montreal</City>
<State>QC</State>
<Zip>H1W 3B8</Zip>
<Country>CA</Country>
</Result>
</ResultSet>
<!-- ws04.search.re2.yahoo.com uncompressed Tue Dec 9 13:39:12 PST 2008 -->
Montreal
All creational methods XMLDoc.newDocument
and XMLDoc.from
requires a boolean attribute ignoreNamespaces
. If this attribute is set to true, all namespaces in the document are ignored. This is really useful if you use XPath a lot since you can avoid prefixing all your XPath elements.
Example:
System.out.println(XMLDoc.newDocument(true)
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addRoot("html"));
System.out.println(XMLDoc.newDocument(false)
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addRoot("html"));
outputs:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html/>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html xmlns="http://www.w3.org/2002/06/xhtml2/"/>
Navigating in a document with namespaces using XPath is quite a pain:
doc.gotoTag("ns0:body").addTag("child")
.gotoParent().addCDATA("with special characters")
.gotoTag("ns0:body").addCDATA("<\"!@#$%'^&*()>")
whereas if you load the same document with ignoreNamespaces
, you can simply navigate like this when you use XPath:
doc.gotoTag("body").addTag("child")
.gotoParent().addCDATA("with special characters")
.gotoTag("body").addCDATA("<\"!@#$%'^&*()>")
When you create or load a document, and if you decide to not ignore namespaces, you can add a default namespace for your document and add other ones after. Namespace management is quite a challenge, specifically when using XPath. When you have an XMLTag instance, you have access to the following methods to manage namespaces in the document:
addDefaultNamespace
When you create an empty document, you can define a default namespace to use for the document. In example:
XMLTag doc = XMLDoc.newDocument()
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addRoot("html");
will produce:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/2002/06/xhtml2/"/>
addNamespace
When you obtained an XMLTag instance, you can add any namespace you want. In example:
XMLTag doc = XMLDoc.newDocument()
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addNamespace("wicket", "http://wicket.sourceforge.net/wicket-1.0")
.addRoot("html")
.addTag("wicket:border")
.gotoRoot().addTag("head")
.addNamespace("other", "http://other-ns.com")
.gotoRoot().addTag("other:foo");
will produce:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/2002/06/xhtml2/">
<wicket:border xmlns:wicket="http://wicket.sourceforge.net/wicket-1.0"/>
<head/>
<other:foo xmlns:other="http://other-ns.com"/>
</html>
Namespace prefix generation
When you load an existing XML document, or when you define a default namespace in a new document, prefixes and namespaces are automatically found in the whole document. Often, XML documents have default namespace. This is often the case for example in XHTML documents, like below. For this case, XMLDoc will generate for you a prefix that you can use for XPath navigation, and register the namespace as being the default one.
In example, the following document will have a default namespace and also a prefix generated to access it: ns0
.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<title/>
</head>
<body/>
</html>
XMLTag doc = XMLDoc.from(...);
assertEquals(doc.getPrefix("http://www.w3.org/1999/xhtml"), "ns0");
assertEquals(doc.getContext().getNamespaceURI("ns0"), "http://www.w3.org/1999/xhtml");
The prefix 'ns0' has been generated in the namespace context of the document so that XPath expression can use it.
You can access the javax.xml.namespace.NamespaceContext like this:
NamespaceContext ctx = doc.getContext();
You cannot override an already defined prefix in a context, and you cannot override default XML prefixes. The following 3 attempts will throw an exception:
// these prefixes are reserved
XMLDoc.newDocument().addRoot("html").addNamespace("xml", "http://ns0");
XMLDoc.newDocument().addRoot("html").addNamespace("xmlns", "http://ns0");
// shows namespace generation prefix: when we add default prefix, 'ns0' is also created (or another if it already exists). So we cannot bind another namespace to this prefix.
XMLDoc.newDocument()
.addDefaultNamespace("http://def")
.addRoot("html")
.addNamespace("ns0", "http://ns0");
Operations affecting elements: hasTag
, addTag
, getCurrentTag
, getCurrentTagName
, deleteChilds
, delete
, renameTo
hasTag
Check for the existence of a tag.
addTag
Create a new tag
System.out.println(XMLDoc.newDocument(true)
.addRoot("html")
.addTag("head")
.toString());
outputs:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html>
<head/>
</html>
getCurrentTag
Returns the current org.w3c.dom.Element
.
getCurrentTagName
Returns the current tag name.
System.out.println(XMLDoc.newDocument(true).addRoot("html").getCurrentTagName());
outputs:
html
delete
Deletes the current tag. The parent tag of the deleted tag becomes one the current tag. If we call delete on the root tag, an exception is thrown. Root node can only be renamed.
System.out.println(XMLDoc.newDocument(true)
.addRoot("html")
.addTag("head")
.delete()
.toString());
outputs:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html/>
deleteChilds
Deletes all tags under the current tag.
System.out.println(XMLDoc.newDocument(true)
.addRoot("html")
.addTag("head").addTag("title")
.toString());
System.out.println(XMLDoc.newDocument(true)
.addRoot("html")
.addTag("head").addTag("title")
.gotoRoot().deleteChilds()
.toString());
outputs:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html>
<head>
<title/>
</head>
</html>
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html/>
renameTo
Rename a tag to another name.
System.out.println(XMLDoc.newDocument(true)
.addRoot("html")
.renameTo("xhtml")
.toString());
outputs:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html/>
Operations affecting elements: hasAttribute
, getAttributeNames
, getAttribute
, deleteAttributes
, deleteAttribute
Supposing we load the following XML file:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>my title</title>
</head>
<body>
<div id="header" class="banner"></div>
<div id="content" class="cool"></div>
<div id="footer" class="end"></div>
</body>
</html>
hasAttribute
Check for the existence of an attribute.
getAttributeNames
Returns a list of attribute names of the current tag.
String[] names = XMLDoc.from(resource("test.xhtml"), true)
.gotoTag("body/div[1]")
.getAttributeNames();
System.out.println(Arrays.toString(names));
outputs:
[class, id]
getAttribute
Returns an attribute value of the current tag or the selected tag by the XPath expression. If the attribute does not exist, throws an exception.
System.out.println(XMLDoc.from(resource("test.xhtml"), true)
.gotoTag("body/div[1]")
.getAttribute("class"));
System.out.println(XMLDoc.from(resource("test.xhtml"), true)
.getAttribute("class", "body/div[2]"));
outputs:
banner
cool
deleteAttributes
Deletes all attributes of a the current tag.
System.out.println(XMLDoc.from(resource("test.xhtml"), true)
.gotoTag("body/div[1]")
.deleteAttributes()
.toString());
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>my title</title>
</head>
<body>
<div/>
<div class="cool" id="content"/>
<div class="end" id="footer"/>
</body>
</html>
deleteAttribute
Deletes a specific attribute. If it does not exist, an exception is thrown.
System.out.println(XMLDoc.from(getClass().getResource("/test.xhtml"), true)
.hasAttribute("id", "body/div[1]"));
System.out.println(XMLDoc.from(getClass().getResource("/test.xhtml"), true)
.gotoTag("body/div[1]").deleteAttribute("id")
.hasAttribute("id"));
true
false
Operations affecting elements: addText
, addCDATA
, getAttribute
, deleteAttributes
, deleteAttribute
addText, addCDATA
Adds text or CDATA sections to the document. As you have seen above, you can mix text, data and tags under one tag. When we add text or data, the current tag automatically becomes the parent tag. This behavior facilitate document creation since most of the time you will have to add one text or one data per tag like this:
System.out.println(XMLDoc.newDocument(true)
.addRoot("html")
.addTag("head").addText("<\"!@#$%'^&*()>")
.addTag("body").addCDATA("<\"!@#$%'^&*()>")
.toString());
which gives:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<html>
<head><"!@#$%'^&*()></head>
<body><![CDATA[<"!@#$%'^&*()>]]></body>
</html>
getText, getCDATA
Returns the text or data contained in the current tag or the targetted tag with the XPath expression. If the tag has no text, returns "".
Given:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/2002/06/xhtml2/" xmlns:ns0="http://wicket.sourceforge.net/wicket-1.0">
<head>
<title ns0:id="titleID">my special title: <"!@#$%'^&*()></title>
</head>
<body>
<![CDATA[my special data: ]]>
<ns0:border>
<div/>
child1
</ns0:border>
<ns0:border>child2</ns0:border>
<![CDATA[<"!@#$%'^&*()>]]>
<ns0:border>child3</ns0:border>
</body>
</html>
The following assertions are true:
assertEquals(doc.getCurrentTag().getNodeType(), Document.ELEMENT_NODE);
assertEquals(doc.getCurrentTagName(), "html");
assertEquals(doc.getCurrentTagName(), "html");
assertEquals(doc.getPefix("http://www.w3.org/2002/06/xhtml2/"), "ns1"); // ns0 is already used in the document
assertEquals(doc.gotoTag("ns1:head/ns1:title").getText(), "my special title: <\"!@#$%'^&*()>");
assertEquals(doc.getText("."), "my special title: <\"!@#$%'^&*()>");
assertEquals(doc.getCDATA("../../ns1:body"), "my special data: <\"!@#$%'^&*()>");
assertEquals(doc.getAttribute("ns0:id"), "titleID");
NB: we loaded the document by not ignorign namespaces. That's why you see required ns prefixes in XPath expressions.
You can execute RAW XPath directly through Java Xpath API by using rawXpath
methods:
- Boolan XMLTag.rawXpathBoolean(...)
- Number XMLTag.rawXpathNumber(...)
- String XMLTag.rawXpathString(...)
- Node XMLTag.rawXpathNode(...)
- NodeList XMLTag.rawXpathNodeSet(...)
Navigation in the document is achieved by gotos
methods
gotoParent
Returns to the parent tag, or remain to the root tag if we are already in the root tag.
gotoRoot
As it says, goes to the root tag.
gotoChild
Goes to the only existing child of a tag. It is just a useful method to traverse XML document from child to child when there are only one child per element. If you call this method when you are in a tag that does not contain exactly one child element, the method will throw an exception.
gotoChild(int i)
Goes to the Nth child of the current element. Index is from 1 up to child number
, exactly like XPath array selection (child[i]) If the child at given position does not exist, an exception is thrown.
gotoChild(String name)
Goes to to the unique existing child element having given name. If there is no child with this name, or if there are more than one, an exception will be thrown.
gotoTag(String relativeXpath, Object... arguments)
Goes to to a tag element given an XPath expression. arguments
is useful to parametrize the XPath expression with namespace prefixes for example. It uses String.format(). Remember when using XPath on a document with namespaces, you must always use prefixes even when the document has a default namespace.
Example:
Given:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/2002/06/xhtml2/" xmlns:w="http://wicket.sourceforge.net/wicket-1.0">
<head>
<title w:id="title"/>
</head>
<body>
<w:border>
<div/>
child1
</w:border>
<w:border>child2</w:border>
<w:border>child3</w:border>
</body>
</html>
We can browse the above document like this:
XMLTag doc = XMLDoc.from(getClass().getResource("/goto.xml"), false);
String ns = doc.getPefix("http://www.w3.org/2002/06/xhtml2/");
doc.gotoChild("head") // jump to the only 'head' tag under 'html'
.gotoChild() // jump to the only child of 'head'
.gotoRoot() // go to 'html'
.gotoChild(2) // go to child 'body'
.gotoChild(3) // go to third child 'w:border' having text 'child3'
.gotoRoot() // return to root
.gotoTag("%1$s:body/w:border[1]/%1$s:div", ns); // xpath navigation with namespace
Notice the Xpath expression when we use namespace: as we load an existing document, we can get generated prefix for a namespace with the getPrefix
method. Then we can use this generated prefix in our XPath. %1$s
means that we take the first argument provided (see String.format() documentation). If you debug, you will see that the XPath expression is ns0:body/w:border[1]/ns0:div
.
Callbacks: forEach
, forEachChilds
XMLTool enables you to execute callback actions for each node selected or each child nodes.
Example:
If we take back the XHTML example:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>my title</title>
</head>
<body>
<div id="header" class="banner"></div>
<div id="content" class="cool"></div>
<div id="footer" class="end"></div>
</body>
</html>
And we execute:
XMLDoc.from(getClass().getResource("/test.xhtml"), true).forEachChild(new CallBack() {
public void execute(XMLTag doc) {
System.out.println(doc.getCurrentTagName());
}
XMLDoc.from(getClass().getResource("/test.xhtml"), true).forEach(new CallBack() {
public void execute(XMLTag doc) {
System.out.println(doc.getAttribute("id"));
}
}, "//div");
We obtain:
head
body
header
content
footer
Document conversion is done through to*
methods.
toDocument
Converts to an org.w3c.dom.Document instance.
toString toString(String encoding)
Converts to a formatted string, optionally giving an encoding.
toBytes
Convert to a byte array
toResult, toStream
Converts to streams.
Example:
XMLDoc.newDocument(true).addRoot("html")
.toResult(new DOMResult())
.toStream(new StringWriter())
.toStream(new ByteArrayOutputStream());
XML validation enables to validate current document against a shema. Of course, to use this functionnality you need to create a document that does not ignore namespaces.
validate
This method is used to validate the document against schemas. It returns a ValidationResult instance containing all warning and error issued during validation.
Example:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<html xmlns="http://www.w3.org/2002/06/xhtml2/" xmlns:w="http://wicket.sourceforge.net/wicket-1.0">
<head>
<title w:id="title"/>
</head>
<body>
<w:border>
<div/>
child1
</w:border>
<w:border>child2</w:border>
<w:border>child3</w:border>
</body>
</html>
- Namespace http://www.w3.org/2002/06/xhtml2/ is defined in schema http://www.w3.org/MarkUp/SCHEMA/xhtml2.xsd
- Namespace http://wicket.sourceforge.net/wicket-1.0 is defined in schema http://wicket.sourceforge.net/wicket-1.0.xsd
If we validate the XML document goto.xml seen above:
ValidationResult results = XMLDoc.from(getClass().getResource("/goto.xml")).validate(
new URL("http://www.w3.org/MarkUp/SCHEMA/xhtml2.xsd"),
new URL("http://wicket.sourceforge.net/wicket-1.0.xsd")
);
assertFalse(results.hasError());
If we validate the following document created by us below:
results = XMLDoc.newDocument()
.addDefaultNamespace("http://www.w3.org/2002/06/xhtml2/")
.addRoot("htmlZZ")
.validate(new URL("http://www.w3.org/MarkUp/SCHEMA/xhtml2.xsd"));
assertTrue(results.hasError());
System.out.println(Arrays.deepToString(results.getErrorMessages()));
The output is:
[cvc-elt.1: Cannot find the declaration of element 'htmlxxx'.]
Each operation causing an exception throws a XMLDocumentException with a described message.