e.g.: point.extensions.get("hr")
jedie opened this issue ยท 11 comments
It would be cool to easier get values from gpx extensions.
e.g.:
<trkpt lat="51.43788929097354412078857421875" lon="6.617012657225131988525390625">
<ele>23.6000003814697265625</ele>
<time>2018-02-21T14:30:50.000Z</time>
<extensions>
<ns3:TrackPointExtension>
<ns3:hr>125</ns3:hr>
<ns3:cad>75</ns3:cad>
</ns3:TrackPointExtension>
</extensions>
</trkpt>
>>> point.extensions.get("hr")
'125'
>>> point.extensions.get("cad")
'75'
Don't know if it possible to get integers here. Is somewhere the information about the extension types?!?
I have now made this:
def get_extension_data(gpxpy_instance):
"""
return a dict with all extension values from all track points.
"""
extension_data = collections.defaultdict(list)
for track in gpxpy_instance.tracks:
for segment in track.segments:
for point in segment.points:
extensions = point.extensions
if not extensions:
return None
for child in extensions[0].getchildren():
tag = child.tag.rsplit("}", 1)[-1] # FIXME
value = child.text
try:
if "." in value:
value = float(value)
else:
value = int(value)
except ValueError:
pass
extension_data[tag].append(value)
return extension_data
Any idea how to make this better? How to get easier the "name" of the extensions?
Keep in mind that an extension can theoretically contain multiple extensions and each can be any kind of xml subtree, for example:
<extensions>
<ns3:Ext attr="bbb">
<ns3:hr>125</ns3:hr>
<ns3:hr>125</ns3:hr>
<ns3:hr>125</ns3:hr>
<ns3:hr>125</ns3:hr>
<ns3:hr>125</ns3:hr>
<ns3:cad><ns3:bbb>75</ns3:bbb></ns3:cad>
</ns3:Ext>
</extensions>
And now you need a simple and easy way to get the attr
attribute, the hr
values, and cad->bbb
value.
Any idea how to make a simple to use API ?
Well, no, not yet :) But, now that you asked, here are a couple of ideas:
Maybe something like:
points.extensions.get("TrackPointExtension", "hr") # returns a string
points.extensions.get_float("TrackPointExtension", "hr") # returns a number
Or, let's suppose there san be multiple hr
tags:
points.extensions.get("TrackPointExtension", "hr[2]")
...and hr
would just be an alias for hr[0]
.
Or maybe:
points.extensions.get("TrackPointExtension", "hr").string()
points.extensions.get("TrackPointExtension", "hr").number()
# in case of multiple "hr" elements, get the fourth one:
points.extensions.get("TrackPointExtension", "hr", 3).number()
# set a value:
points.extensions.get("TrackPointExtension", "hr").set(100)
points.extensions.get("TrackPointExtension", "hr[2]")
This looks ugly ;)
points.extensions.get("TrackPointExtension", "hr").string()
points.extensions.get("TrackPointExtension", "hr").number()
# in case of multiple "hr" elements, get the fourth one:
points.extensions.get("TrackPointExtension", "hr", 3).number()
# set a value:
points.extensions.get("TrackPointExtension", "hr").set(100)
This looks ok... Maybe "number" -> "float" ?!?
Because there can be multiple entries: get("TrackPointExtension", "hr")
is a "shortcut" for: get("TrackPointExtension", "hr", 0)
isn't it?
Yes, I agree (including the "ugly" remark ;) ). Also, the API should allow for a way to retrieve attributes. Something like this:
points.extensions.get("ExtensionName", "tagName", "#attribute")
Or maybe:
points.extensions.getFloat("ExtensionName", "tagName", "#attribute")
points.extensions.getString("ExtensionName", "tagName", "#attribute")
points.extensions.get("ExtensionName", "tagName", "#attribute") #returns the DOM element
points.extensions.get_float("ExtensionName", "tagName", "#attribute")
points.extensions.get_string("ExtensionName", "tagName", "#attribute")
points.extensions.get("ExtensionName", "tagName", "#attribute") #returns the DOM element
;)
Just curious if there is a solution here-- gpx from Strava have hr, cadance, etc data as extensions like this:
<extensions>
<gpxtpx:TrackPointExtension>
<gpxtpx:hr>80</gpxtpx:hr>
<gpxtpx:cad>0</gpxtpx:cad>
</gpxtpx:TrackPointExtension>
</extensions>
However, as far as I can tell, I don't think this data is getting brought into the extensions
attribute of points. Given my understanding of the scope here this may be a bug. Any recommendation or advice on how to get the extensions data in practice is greatly appreciated.
Here's a working version from Strava, @pwolfram. It's not the prettiest, but it works.
import pandas as pd
import gpxpy
import lxml
from pathlib import Path
def df_from_segment(segment) -> pd.DataFrame:
seg_list = []
for point in segment.points:
base_data = {
'timestamp': point.time,
'latitude': point.latitude,
'longitude': point.longitude,
'elevation': point.elevation,
'speed': point.speed
}
extension_data = {
lxml.etree.QName(child).localname: sloppy_float(child.text)
for child in point.extensions[0]
}
for k, v in extension_data.items():
base_data[k] = v
seg_list.append(base_data)
return pd.DataFrame(seg_list)
def df_from_track(track) -> pd.DataFrame:
return pd.concat([df_from_segment(segment) for segment in track.segments])
def df_from_gpx(gpx):
return pd.concat([df_from_track(track) for track in gpx.tracks])
gpxfile = gpxpy.parse(Path("stravafile.gpx").read_text())
gpxfile_df = df_from_gpx(gpxfile)
@andyreagan @pwolfram Would love to knov if your strava files convert with hr
properly with my gpxcsv converter (which while it makes csv, can also easily make a list of dicts for a dataframe.) It works well on the hr
and other extension data in Apple Watch exported gpx files I have tried, but I haven't used strava. You'd just:
import pandas as pd
from gpxcsv import gpxtolist
df = pd.DataFrame(gpxtolist('myfile.gpx'))
@astrowonk confirmed, this works perfectly!