How do I access deeply nested children?
jakehawkes opened this issue · 5 comments
(This issue was initially in a comment for another issue, and I thought it might be best to start a new issue instead of hijacking an old one. Apologies for the confusion.)
I have an XML file with deeply nested elements, and they are all under higher level elements with the same name. An example:
<MeasurementRecords attrib="something">
<HistoryRecords>
<ValueItemId>100_0000100004_3788_Resource-0.customId_WSx Data Precip Type</ValueItemId>
<List>
<HistoryRecord>
<Value>60</Value>
<State>Valid</State>
<TimeStamp>2016-04-20T12:40:00Z</TimeStamp>
</HistoryRecord>
</List>
</HistoryRecords>
<HistoryRecords>
<ValueItemId>100_0000100004_3788_Resource-0.customId_Specific Enthalpy (INS)</ValueItemId>
<List>
<HistoryRecord>
<Value>33</Value>
<State>Valid</State>
<TimeStamp>2016-04-20T12:40:00Z</TimeStamp>
</HistoryRecord>
</List>
</HistoryRecords>
How do I access the <value>
of the Specific Enthalpy
element? From other examples I assume I should loop though all the HistoryRecords
elements. But when I do that, it appears the children are NOT in the object. My attempt so far:
for HistoryRecord in RSPobj.MeasurementRecords.HistoryRecords:
if HistoryRecord.ValueItemId.cdata == "100_0000100004_3788_Resource-0.customId_Specific Enthalpy (INS)":
pprint(HistoryRecord.ValueItemId)
Gives me:
$ python parseRSPXMLfiles.py
Element(name = ValueItemId, attributes = {}, cdata = 100_0000100004_3788_Resource-0.customId_Specific Enthalpy (INS))
Where are all the children?
I was expecting to be able to do something like this:
pprint(HistoryRecord.ValueItemId.List.HistoryRecord.Value)
But that gives me this error:
Traceback (most recent call last):
File "parseRSPXMLfiles.py", line 17, in <module>
pprint(HistoryRecord.ValueItemId.List.HistoryRecord.Value)
File "/usr/lib/python2.7/site-packages/untangle.py", line 66, in __getattr__
raise IndexError('Unknown key <%s>' % key)
IndexError: Unknown key <List>
FYI, this:
pprint(dir(HistoryRec.ValueItemId))
Results in []
being printed.
I think you need something along the lines of:
for HRecord in RSPobj.MeasurementRecords.HistoryRecords:
pprint(HRecord.List.HistoryRecord.Value.cdata)
instead of
for HRecord in RSPobj.MeasurementRecords.HistoryRecords:
pprint(HRecord.ValueItemId.List.HistoryRecord.Value.cdata)
Since List
is a child of HistoryRecords
and not of ValueItemId
in your xml.
The top code prints the specific enthalpies if I run it.
Awesome! But now I have another problem =)
By first, another question. The code you provided prints the value
tag for all valueItemId
tags. So I'm thinking there is no way to access the value
tag for a specific valueItemId
tag without looping through the tree, like this?
for HRecord in tree.MeasurementRecords.HistoryRecords:
if(tempRE.match(HRecord.ValueItemId.cdata)):
print("%s temperature is: %s (%s)"
% (HRecord.List.HistoryRecord.TimeStamp.cdata, HRecord.List.HistoryRecord.Value.cdata, HRecord.List.HistoryRecord.State.cdata))
right? Ok, fair enough. But now, I've discovered that some of the input XML files look like this:
<MeasurementRecords attrib="something">
<HistoryRecords>
<ValueItemId>100_0000100004_3788_Resource-0.customId_Temperature (AVG)</ValueItemId>
<List>
<HistoryRecord>
<Value>12.4</Value>
<State>Valid</State>
<TimeStamp>2016-04-21T09:00:00Z</TimeStamp>
</HistoryRecord>
<HistoryRecord>
<Value>12.3</Value>
<State>Valid</State>
<TimeStamp>2016-04-21T09:05:00Z</TimeStamp>
</HistoryRecord>
</List>
</HistoryRecords>
</MeasurementRecords>
And now I'm getting an error like this:
Traceback (most recent call last):
File "parseRSPXMLfilesUntangle.py", line 36, in <module>
% (HRecord.List.HistoryRecord.TimeStamp.cdata, HRecord.List.HistoryRecord.Value.cdata, HRecord.List.HistoryRecord.State.cdata))
AttributeError: 'list' object has no attribute 'TimeStamp'
Ok, so maybe I need to check if there is a list of HistoryRecord
elements? But this:
print(len(HRecord.List.HistoryRecord))
gives me:
Traceback (most recent call last):
File "parseRSPXMLfilesUntangle.py", line 35, in <module>
print(len(HRecord.List.HistoryRecord))
File "/usr/lib/python2.7/site-packages/untangle.py", line 66, in __getattr__
raise IndexError('Unknown key <%s>' % key)
IndexError: Unknown key <__len__>
Concerning your first question, I'm also new to the whole XML thing so it is a learning process for me :-).
But I think you must do this since (at least with untangle) you can't 'search' the XML (ie XPath usage).
At least, that is what I currently think.
I will look further into your second problem later (don't have a computer with Python handy right now).
But from the top of my head:
since one List
can (and does) contain multiple HistoryRecord
, you can't simply use HRecord.List.HistoryRecord.TimeStamp.cdata
since the HistoryRecord
is not unique.
Can you try replacing this with:
HRecord.List.HistoryRecord[0].TimeStamp.cdata
for example? (As well as for Value
and State
)
That should do it, you can also use try/except but I don't really know what would be best practice here.
(@SirHooke, thanks for the help!! BTW, this reply is coming after yours because I edited the post because the MD formatting didn't survive the email reply method.)
That works, but only for files that have multiple HistoryRecord
elements.
For files that only have 1, it breaks with this error:
Traceback (most recent call last):
File "parseRSPXMLfilesUntangle.py", line 37, in <module>
% (HRecord.List.HistoryRecord[0].TimeStamp.cdata,
AttributeError: 'NoneType' object has no attribute 'TimeStamp'
I suspect this is because HistoryRecord
isn't always a list. Meaning that as the parse()
traverses the XML, it will only build a list if it finds elements with the same name in the same node of the XML.
Interestingly, this code:
if type(HRecord.List.HistoryRecord) is list: print("List, %d elements" %
len(HRecord.List.HistoryRecord))
prints "List, 2 elements".
So for now, I guess the solution is:
if type(HRecord.List.HistoryRecord) is list:
print("List, %d elements" % len(HRecord.List.HistoryRecord))
# TODO: put this in a loop to get all the HistoryRecord elements
print("%s temperature is: %s (%s)"
% (HRecord.List.HistoryRecord[0].TimeStamp.cdata,
HRecord.List.HistoryRecord[0].Value.cdata,
HRecord.List.HistoryRecord[0].State.cdata))
else:
print("%s temperature is: %s (%s)"
% (HRecord.List.HistoryRecord.TimeStamp.cdata,
HRecord.List.HistoryRecord.Value.cdata,
HRecord.List.HistoryRecord.State.cdata))