Determine root node

Question

Determine root node

Closed this issue 4 years ago · 5 comments

How can I determine the tag name of the root node?
For CPACS I know it's /cpacs, but what about arbitrary XML documents?

Answer 1 · 2020-09-02T08:13:20.000Z

Answer: Use the path "/". When combining/concatenating paths, however, this requires a distinction between / + childName vs. /cpacs + / + childName.

I created a convenience wrapper in Python that accepts the empty path instead and makes above distinction by using either / or the empty string as a concatenation string.

Answer 2 · 2020-09-02T08:18:50.000Z

I did not completely understand, what the issue is / was.

Could you please provide an example, what you tried to achieve, what did not work and what works?

Answer 3 · 2020-09-02T08:22:50.000Z

For the XML document

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <x><a>1.2</a><b>34</b></x>
</root>

How do I determine the root node name /root? I think usually you simply need to know the name of the root node to be able to work with the file, but as described above it is possible to determine its name with TIXI (with one caveat explained above).

Answer 4 · 2020-09-02T08:53:33.000Z

but as described above it is possible to determine its name with TIXI (with one caveat explained above).

But I did not get the caveat, this is what i meant. What is the workaround you are using? I am just trying to understand, whether we could improve something in tixi.

Answer 5 · 2020-09-02T08:59:00.000Z

So the path I provide the function is something like /cpacs/toolspecific, starting with a slash and ending with a tag name (or a predicate like [@uID="1"]). To get the root node, however, I cannot use the empty string , but I have to use / which arguably has a trailing slash - in contrast to all other paths I might query.
That means I cannot recursively go through the returned nodes and compose them by parent + "/" + childName, because then I would get //cpacs/toolspecific. That`s the reason why I had to make a distinction at the root level. I've come about this problem over the last years in many projects, just noticed this today when creating a list of XPaths for all numeric values in an XML document (using tixi).

def enumerateNumericXpaths(xml, path, paths):
  r''' Return a list of all XPaths that point to a number.
      xml: open Tixi document
      path: current path prefix (including index)
      paths: list of XPaths that contain a numeral
  >>> t, r = Tixi(), []
  >>> t.openString('<?xml version="1.0" encoding="UTF-8"?>\n<root><x><a>1.2</a><b>34</b></x></root>')
  >>> paths = enumerateNumericXpaths(t, '', r); print("\n".join(r))
  /root[1]/x[1]/a[1]
  /root[1]/x[1]/b[1]
  '''
  if path == '': path, sep = '/', ''
  else: sep = '/'
  children = xml.getNumberOfChilds(path)
  counts = collections.defaultdict(int)  # map from element name -> current per-element index
  for i in range(1, children + 1):
    childName = xml.getChildNodeName(path, i)
    if childName == '#text':
      try: xml.getDoubleElement(path); paths.append(path)  # xml.getIntegerElement(path)  works as well for int and double
      except: pass
      continue
    elif childName.startswith('#'): continue  # comment or CDATA - ignored for now
    counts[childName] += 1
    enumerateNumericXpaths(xml, path + '%s%s[%d]' % (sep, childName, counts[childName]), paths)  # recurse into children
  return paths  # by reference, I hope!