Error in getting attribute value

Question

Error in getting attribute value

RabbitJackTrade opened this issue 2 years ago · 8 comments

Using elementpath-4.0.1 and Python 3.10.9 in Jupyter:

import xml.etree.ElementTree as ET
from elementpath import select

tei = """<?xml version='1.0' encoding='UTF8'?>
<?xml-model type="application/xml"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <text>
        <pb n="page1"/>          
        <pb n="page2"/>
  </text>
</TEI>
"""

doc= etree.XML(tei.encode())
for p in select(doc,'//pb'):
    print(p.attrib['n'])
    print(p.xpath('./@n')[0]) #or  print(p.xpath('@n')[0])
    print(select(doc,'./@n')) # or print(select(doc,'@n'))
    print('------')

The output returned is:

page1
page1
[]
------
page2
page2
[]
------

Answer 1 · 2023-02-19T21:51:48.000Z

Hi,
lxml's xpath lacks of document position (try '/' expression on an Element or an ElementTree instance).

In elementpath the document position is considered also if you provide an Element instead of an ElementTree instance (currently setting the context item with None, an alternative could be creating a dummy ElementTree instance that wraps the root element ...).

So in you example the select call have to provide the starting context item:

select(doc, './@n', item=p)

Answer 2 · 2023-02-19T22:19:37.000Z

First, thanks - as usual. It works now!

Second, is that point mentioned in the documentation anywhere? If not, should it be added?

Answer 3 · 2023-02-20T07:27:39.000Z

Second, is that point mentioned in the documentation anywhere? If not, should it be added?

There is not a mention about that. I will add a paragraph about concepts and implementation details of XPath selectors.

Answer 4 · 2023-02-20T12:31:03.000Z

Great. Thanks again.

Answer 5 · 2023-02-22T06:18:35.000Z

More on this: the unexpected behavior is generated by the root sibling PI <?xml-model type="application/xml"?>.

In order to preserve this a dummy document node is created and so the root node for XPath selection is the root document, also if you provide select(p, './@n').

This not happens if you use xml.etree.ElementTree because it doesn't parse root siblings.

I can change this behavior for lxml with one of these:

Do nothing (discard root siblings if an Element is provided for root argument)
Create a dummy document node only if the root node has siblings and the provided Element is the root element of the tree

The first option is like the xml.etree.ElementTree behavior. The current option is to create a dummy document node only if the root node has siblings.
If the provided root is an ElementTree instance the document node is created in any case.

Answer 6 · 2023-02-23T08:56:09.000Z

I'm opting for creating a document node only if the provided root Element is the root of the tree. This is coherent with the resolution of #54.

Answer 7 · 2023-03-21T16:31:41.000Z

Hi @RabbitJackTrade,

added a section for advanced topics into documentation. If you want to add more or fix some parts of this feel free to make a PR.

thanks

Answer 8 · 2023-03-21T16:34:57.000Z

Thanks, Davide; I’ll be happy to take a look. Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows From: Davide ***@***.***> Sent: Tuesday, March 21, 2023 12:31 PM To: ***@***.***> Cc: ***@***.***>; ***@***.***> Subject: Re: [sissaschool/elementpath] Error in getting attribute value (Issue #58) Hi @RabbitJackTrade<https://github.com/RabbitJackTrade>, added a section for advanced topics into documentation. If you want to add more or fix some parts of this feel free to make a PR. thanks — Reply to this email directly, view it on GitHub<#58 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AJPGPTAUNXKUZLGZO2AUWM3W5HJXTANCNFSM6AAAAAAVASIWAY>. You are receiving this because you were mentioned.Message ID: ***@***.***>