stchris/untangle

How do I access deeply nested children?

jakehawkes opened this issue · 5 comments

(This issue was initially in a comment for another issue, and I thought it might be best to start a new issue instead of hijacking an old one. Apologies for the confusion.)

I have an XML file with deeply nested elements, and they are all under higher level elements with the same name. An example:

<MeasurementRecords attrib="something">
    <HistoryRecords>
        <ValueItemId>100_0000100004_3788_Resource-0.customId_WSx Data Precip Type</ValueItemId>
        <List>
            <HistoryRecord>
                <Value>60</Value>
                <State>Valid</State>
                <TimeStamp>2016-04-20T12:40:00Z</TimeStamp>
            </HistoryRecord>
        </List>
    </HistoryRecords>
    <HistoryRecords>
        <ValueItemId>100_0000100004_3788_Resource-0.customId_Specific Enthalpy (INS)</ValueItemId>
        <List>
            <HistoryRecord>
                <Value>33</Value>
                <State>Valid</State>
                <TimeStamp>2016-04-20T12:40:00Z</TimeStamp>
            </HistoryRecord>
        </List>
    </HistoryRecords>

How do I access the <value> of the Specific Enthalpy element? From other examples I assume I should loop though all the HistoryRecords elements. But when I do that, it appears the children are NOT in the object. My attempt so far:

for HistoryRecord in RSPobj.MeasurementRecords.HistoryRecords:
    if HistoryRecord.ValueItemId.cdata == "100_0000100004_3788_Resource-0.customId_Specific Enthalpy (INS)":
        pprint(HistoryRecord.ValueItemId)

Gives me:

$ python parseRSPXMLfiles.py
Element(name = ValueItemId, attributes = {}, cdata = 100_0000100004_3788_Resource-0.customId_Specific Enthalpy (INS))

Where are all the children?

I was expecting to be able to do something like this:

pprint(HistoryRecord.ValueItemId.List.HistoryRecord.Value)

But that gives me this error:

Traceback (most recent call last):
  File "parseRSPXMLfiles.py", line 17, in <module>
    pprint(HistoryRecord.ValueItemId.List.HistoryRecord.Value)
  File "/usr/lib/python2.7/site-packages/untangle.py", line 66, in __getattr__
    raise IndexError('Unknown key <%s>' % key)
IndexError: Unknown key <List>

FYI, this:

        pprint(dir(HistoryRec.ValueItemId))

Results in [] being printed.

I think you need something along the lines of:

for HRecord in RSPobj.MeasurementRecords.HistoryRecords:
        pprint(HRecord.List.HistoryRecord.Value.cdata)

instead of

for HRecord in RSPobj.MeasurementRecords.HistoryRecords:
        pprint(HRecord.ValueItemId.List.HistoryRecord.Value.cdata)

Since List is a child of HistoryRecords and not of ValueItemId in your xml.
The top code prints the specific enthalpies if I run it.

Awesome! But now I have another problem =)

By first, another question. The code you provided prints the value tag for all valueItemId tags. So I'm thinking there is no way to access the value tag for a specific valueItemId tag without looping through the tree, like this?

for HRecord in tree.MeasurementRecords.HistoryRecords:
    if(tempRE.match(HRecord.ValueItemId.cdata)):
        print("%s temperature is: %s (%s)" 
            % (HRecord.List.HistoryRecord.TimeStamp.cdata, HRecord.List.HistoryRecord.Value.cdata, HRecord.List.HistoryRecord.State.cdata))

right? Ok, fair enough. But now, I've discovered that some of the input XML files look like this:

<MeasurementRecords attrib="something">
    <HistoryRecords>
        <ValueItemId>100_0000100004_3788_Resource-0.customId_Temperature (AVG)</ValueItemId>
        <List>
            <HistoryRecord>
                <Value>12.4</Value>
                <State>Valid</State>
                <TimeStamp>2016-04-21T09:00:00Z</TimeStamp>
            </HistoryRecord>
            <HistoryRecord>
                <Value>12.3</Value>
                <State>Valid</State>
                <TimeStamp>2016-04-21T09:05:00Z</TimeStamp>
            </HistoryRecord>
        </List>
    </HistoryRecords>
</MeasurementRecords>

And now I'm getting an error like this:

Traceback (most recent call last):
  File "parseRSPXMLfilesUntangle.py", line 36, in <module>
    % (HRecord.List.HistoryRecord.TimeStamp.cdata, HRecord.List.HistoryRecord.Value.cdata, HRecord.List.HistoryRecord.State.cdata))
AttributeError: 'list' object has no attribute 'TimeStamp'

Ok, so maybe I need to check if there is a list of HistoryRecord elements? But this:

        print(len(HRecord.List.HistoryRecord))

gives me:

Traceback (most recent call last):
  File "parseRSPXMLfilesUntangle.py", line 35, in <module>
    print(len(HRecord.List.HistoryRecord))
  File "/usr/lib/python2.7/site-packages/untangle.py", line 66, in __getattr__
    raise IndexError('Unknown key <%s>' % key)
IndexError: Unknown key <__len__>

Concerning your first question, I'm also new to the whole XML thing so it is a learning process for me :-).
But I think you must do this since (at least with untangle) you can't 'search' the XML (ie XPath usage).
At least, that is what I currently think.

I will look further into your second problem later (don't have a computer with Python handy right now).
But from the top of my head:
since one List can (and does) contain multiple HistoryRecord, you can't simply use HRecord.List.HistoryRecord.TimeStamp.cdata since the HistoryRecord is not unique.
Can you try replacing this with:
HRecord.List.HistoryRecord[0].TimeStamp.cdata
for example? (As well as for Value and State)

That should do it, you can also use try/except but I don't really know what would be best practice here.

(@SirHooke, thanks for the help!! BTW, this reply is coming after yours because I edited the post because the MD formatting didn't survive the email reply method.)

That works, but only for files that have multiple HistoryRecord elements.
For files that only have 1, it breaks with this error:

Traceback (most recent call last):
  File "parseRSPXMLfilesUntangle.py", line 37, in <module>
    % (HRecord.List.HistoryRecord[0].TimeStamp.cdata,
AttributeError: 'NoneType' object has no attribute 'TimeStamp'

I suspect this is because HistoryRecord isn't always a list. Meaning that as the parse() traverses the XML, it will only build a list if it finds elements with the same name in the same node of the XML.
Interestingly, this code:

    if type(HRecord.List.HistoryRecord) is list: print("List, %d elements" %
        len(HRecord.List.HistoryRecord))

prints "List, 2 elements".

So for now, I guess the solution is:

if type(HRecord.List.HistoryRecord) is list:
    print("List, %d elements" % len(HRecord.List.HistoryRecord))
    # TODO: put this in a loop to get all the HistoryRecord elements
    print("%s temperature is: %s (%s)"
        % (HRecord.List.HistoryRecord[0].TimeStamp.cdata,
          HRecord.List.HistoryRecord[0].Value.cdata,
          HRecord.List.HistoryRecord[0].State.cdata))
else:
    print("%s temperature is: %s (%s)"
        % (HRecord.List.HistoryRecord.TimeStamp.cdata,
          HRecord.List.HistoryRecord.Value.cdata,
          HRecord.List.HistoryRecord.State.cdata))