Parsing `xml` file into python
Closed this issue · 6 comments
@jdherman if you had a experience with exporting a xml
file into python, do you know how to access these three elements for Data Item 'LMP_PRC' from data frame?
- INTERVAL_START_GMT
- INTERVAL_END_GMT
- VALUE
my python code:
# This code optimizes operations of pumped-storage hydropower facilities.
# Mustafa Dogan
#10/06/2016
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import quad, dblquad, trapz, simps
from scipy.stats import lognorm
from mpl_toolkits.mplot3d import Axes3D
import xml.etree.ElementTree as ET
# import seaborn as sns
# sns.set_style('whitegrid')
tree = ET.parse('20160901_20161002_PRC_LMP_DAM_20161005_11_18_31_v1.xml')
root = tree.getroot()
# this does not work!!!
print(root[0][0].text)
for child in root:
print(child.tag)
Wow, not the easiest format for a time series of energy prices!
What does it say in the error message? What is "root"?
If you upload or link me to the XML file I can play around with it.
well there is no error message yet. This code
tree = ET.parse('20160901_20161002_PRC_LMP_DAM_20161005_11_18_31_v1.xml')
reads xml file but I couldn't figure out how to retrieve stuff that I want. I thought this is like a dictionary but it is not :) There is also csv
version. If this is too time consuming, I can just use csv
. I thought, if I read this as dictionary, it might be more useful than a csv. Files are zipped and attached. Thanks a lot!
Pumped_Storage.zip
Oh, use CSV!! Use pandas to load it, and it will behave like a dictionary. (XML is tough, you have to iterate over all of the child elements).
import pandas as pd
import seaborn as sns
name = 'Pumped_Storage/20160901_20161002_PRC_LMP_DAM_20161005_11_19_18_v1.csv'
# assume the "start time" is the index for the dataframe
df = pd.read_csv(name, index_col=0, parse_dates=True)
df.MW.plot(style='o') # access any column name here (I assume MW is the price?)
plt.ylabel('Whatever this is')
plt.show()
Great! I will use pandas and csv instead, then. Thank you.
Sure thing. If you still want to try the nested-dict idea, there's a library called xmltodict
:
https://github.com/martinblech/xmltodict
Just pip install xmltodict
and then
import xmltodict
my_dict = xmltodict.parse(open('whatever_file.xml'))
I've never used this, only found by googling.
I think pandas is good for now :) I really appreciate that.