sissaschool/elementpath

Error in getting attribute value

RabbitJackTrade opened this issue · 8 comments

Using elementpath-4.0.1 and Python 3.10.9 in Jupyter:

import xml.etree.ElementTree as ET
from elementpath import select

tei = """<?xml version='1.0' encoding='UTF8'?>
<?xml-model type="application/xml"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <text>
        <pb n="page1"/>          
        <pb n="page2"/>
  </text>
</TEI>
"""

doc= etree.XML(tei.encode())
for p in select(doc,'//pb'):
    print(p.attrib['n'])
    print(p.xpath('./@n')[0]) #or  print(p.xpath('@n')[0])
    print(select(doc,'./@n')) # or print(select(doc,'@n'))
    print('------')

The output returned is:

page1
page1
[]
------
page2
page2
[]
------

Hi,
lxml's xpath lacks of document position (try '/' expression on an Element or an ElementTree instance).

In elementpath the document position is considered also if you provide an Element instead of an ElementTree instance (currently setting the context item with None, an alternative could be creating a dummy ElementTree instance that wraps the root element ...).

So in you example the select call have to provide the starting context item:

select(doc, './@n', item=p)

First, thanks - as usual. It works now!

Second, is that point mentioned in the documentation anywhere? If not, should it be added?

Second, is that point mentioned in the documentation anywhere? If not, should it be added?

There is not a mention about that. I will add a paragraph about concepts and implementation details of XPath selectors.

Great. Thanks again.

More on this: the unexpected behavior is generated by the root sibling PI <?xml-model type="application/xml"?>.

In order to preserve this a dummy document node is created and so the root node for XPath selection is the root document, also if you provide select(p, './@n').

This not happens if you use xml.etree.ElementTree because it doesn't parse root siblings.

I can change this behavior for lxml with one of these:

  • Do nothing (discard root siblings if an Element is provided for root argument)
  • Create a dummy document node only if the root node has siblings and the provided Element is the root element of the tree

The first option is like the xml.etree.ElementTree behavior. The current option is to create a dummy document node only if the root node has siblings.
If the provided root is an ElementTree instance the document node is created in any case.

I'm opting for creating a document node only if the provided root Element is the root of the tree. This is coherent with the resolution of #54.

Hi @RabbitJackTrade,

added a section for advanced topics into documentation. If you want to add more or fix some parts of this feel free to make a PR.

thanks