neslib/Neslib.Xml

Method for getting the text of an element

Closed this issue · 3 comments

This library definitely lacks a property or method to get text from a node.

XmlNode := XmlDoc.DocumentElement.ElementByName('Name').FirstChild;
while XmlNode <> nil do
begin
  if XmlNode.NodeType = TXmlNodeType.Text then 
    Break;
  XmlNode := XmlNode.NextSibling;
end;
SomeThing = XmlNode.Value;

Or did I just not find a way to do it simply?

I didn't add a helper method for this because there are different interpretations of what the text of an element actually is, and I didn't want any confusion.

In your example code, you only return the text of the first child of type Text. But that is not what other people may expect. What if an element has multiple text child nodes (maybe with other elements in between)? Should we concatenate the texts of all these child nodes together, and should we put spaces between them if needed? What if some of the child nodes have their own children? Should those children (and grandchildren etc.) also be included in the text?

There is no straightforward solution here, which is why I didn't implement this. What do you think is the most logical interpretation of such a property? Maybe I can make an implementation for that.

What do you think is the most logical interpretation of such a property?

As I understand the standard text is not a separate element, but part of the parent one (just like an attribute).

An element can contain:
text
attributes
other elements
or a mix of the above

From my point of view, it would be convenient to have some .Text property containing all text fragments of the given element, combined by a space, for example. But you can also leave the current text nodes for those who care about exactly how the text is located relative to other elements. The text inside other (nested) elements belongs to those elements.

I added a TXmlNode.Text property. For Text, CData and Comment nodes, it returns the same as the Value property. For Element nodes, it returns a concatenation of all direct children of type Text or CData (with spaces added if needed).