tkrajina/gpxpy

e.g.: point.extensions.get("hr")

jedie opened this issue ยท 11 comments

jedie commented

It would be cool to easier get values from gpx extensions.

e.g.:

      <trkpt lat="51.43788929097354412078857421875" lon="6.617012657225131988525390625">
        <ele>23.6000003814697265625</ele>
        <time>2018-02-21T14:30:50.000Z</time>
        <extensions>
          <ns3:TrackPointExtension>
            <ns3:hr>125</ns3:hr>
            <ns3:cad>75</ns3:cad>
          </ns3:TrackPointExtension>
        </extensions>
      </trkpt>
>>> point.extensions.get("hr")
'125'
>>> point.extensions.get("cad")
'75'

Don't know if it possible to get integers here. Is somewhere the information about the extension types?!?

jedie commented

I have now made this:

def get_extension_data(gpxpy_instance):
    """
    return a dict with all extension values from all track points.
    """
    extension_data = collections.defaultdict(list)

    for track in gpxpy_instance.tracks:
        for segment in track.segments:
            for point in segment.points:
                extensions = point.extensions
                if not extensions:
                    return None

                for child in extensions[0].getchildren():
                    tag = child.tag.rsplit("}", 1)[-1] # FIXME

                    value = child.text
                    try:
                        if "." in value:
                            value = float(value)
                        else:
                            value = int(value)
                    except ValueError:
                        pass
                    extension_data[tag].append(value)

    return extension_data

Any idea how to make this better? How to get easier the "name" of the extensions?

Keep in mind that an extension can theoretically contain multiple extensions and each can be any kind of xml subtree, for example:

    <extensions>
      <ns3:Ext attr="bbb">
        <ns3:hr>125</ns3:hr>
        <ns3:hr>125</ns3:hr>
        <ns3:hr>125</ns3:hr>
        <ns3:hr>125</ns3:hr>
        <ns3:hr>125</ns3:hr>
        <ns3:cad><ns3:bbb>75</ns3:bbb></ns3:cad>
      </ns3:Ext>
    </extensions>

And now you need a simple and easy way to get the attr attribute, the hr values, and cad->bbb value.

jedie commented

Any idea how to make a simple to use API ?

Well, no, not yet :) But, now that you asked, here are a couple of ideas:

Maybe something like:

points.extensions.get("TrackPointExtension", "hr") # returns a string
points.extensions.get_float("TrackPointExtension", "hr") # returns a number

Or, let's suppose there san be multiple hr tags:

points.extensions.get("TrackPointExtension", "hr[2]")

...and hr would just be an alias for hr[0].

Or maybe:

points.extensions.get("TrackPointExtension", "hr").string()
points.extensions.get("TrackPointExtension", "hr").number()
# in case of multiple "hr" elements, get the fourth one:
points.extensions.get("TrackPointExtension", "hr", 3).number()
# set a value:
points.extensions.get("TrackPointExtension", "hr").set(100)
jedie commented

points.extensions.get("TrackPointExtension", "hr[2]")

This looks ugly ;)

points.extensions.get("TrackPointExtension", "hr").string()
points.extensions.get("TrackPointExtension", "hr").number()
# in case of multiple "hr" elements, get the fourth one:
points.extensions.get("TrackPointExtension", "hr", 3).number()
# set a value:
points.extensions.get("TrackPointExtension", "hr").set(100)

This looks ok... Maybe "number" -> "float" ?!?

Because there can be multiple entries: get("TrackPointExtension", "hr") is a "shortcut" for: get("TrackPointExtension", "hr", 0) isn't it?

Yes, I agree (including the "ugly" remark ;) ). Also, the API should allow for a way to retrieve attributes. Something like this:

points.extensions.get("ExtensionName", "tagName", "#attribute")

Or maybe:

points.extensions.getFloat("ExtensionName", "tagName", "#attribute")
points.extensions.getString("ExtensionName", "tagName", "#attribute")
points.extensions.get("ExtensionName", "tagName", "#attribute") #returns the DOM element
jedie commented
points.extensions.get_float("ExtensionName", "tagName", "#attribute")
points.extensions.get_string("ExtensionName", "tagName", "#attribute")
points.extensions.get("ExtensionName", "tagName", "#attribute") #returns the DOM element

;)

Just curious if there is a solution here-- gpx from Strava have hr, cadance, etc data as extensions like this:

    <extensions>
     <gpxtpx:TrackPointExtension>          
      <gpxtpx:hr>80</gpxtpx:hr>
      <gpxtpx:cad>0</gpxtpx:cad>
     </gpxtpx:TrackPointExtension>
    </extensions>

However, as far as I can tell, I don't think this data is getting brought into the extensions attribute of points. Given my understanding of the scope here this may be a bug. Any recommendation or advice on how to get the extensions data in practice is greatly appreciated.

Here's a working version from Strava, @pwolfram. It's not the prettiest, but it works.

import pandas as pd
import gpxpy
import lxml
from pathlib import Path


def df_from_segment(segment) -> pd.DataFrame:
    seg_list = []

    for point in segment.points:
        base_data = {
            'timestamp': point.time,
            'latitude': point.latitude,
            'longitude': point.longitude,
            'elevation': point.elevation,
            'speed': point.speed
        }
        extension_data = {
            lxml.etree.QName(child).localname: sloppy_float(child.text)
            for child in point.extensions[0]
        }
        for k, v in extension_data.items():
            base_data[k] = v
        seg_list.append(base_data)
    return pd.DataFrame(seg_list)


def df_from_track(track) -> pd.DataFrame:
    return pd.concat([df_from_segment(segment) for segment in track.segments])


def df_from_gpx(gpx):
    return pd.concat([df_from_track(track) for track in gpx.tracks])


gpxfile = gpxpy.parse(Path("stravafile.gpx").read_text())
gpxfile_df = df_from_gpx(gpxfile)

@andyreagan @pwolfram Would love to knov if your strava files convert with hr properly with my gpxcsv converter (which while it makes csv, can also easily make a list of dicts for a dataframe.) It works well on the hr and other extension data in Apple Watch exported gpx files I have tried, but I haven't used strava. You'd just:

import pandas as pd
from gpxcsv import gpxtolist

df = pd.DataFrame(gpxtolist('myfile.gpx'))

@astrowonk confirmed, this works perfectly!