/runpandas

Python Package for handing running data from GPS-enabled tracking devices.

Primary LanguagePythonMIT LicenseMIT

image

RunPandas - Python Package for handing running data from GPS-enabled tracking devices and applications.

image

image

image

CodeFactor

image

image

image

image

image

image

image

Requirements Status

image


Introduction

RunPandas is a project to add support for data collected by GPS-enabled tracking devices, heart rate monitors data to [pandas](http://pandas.pydata.org) objects. It is a Python package that provides infrastructure for importing tracking data from such devices, enabling statistical and visual analysis for running enthusiasts and lovers. Its goal is to fill the gap between the routine collection of data and their manual analyses in Pandas and Python.

Documentation

Stable documentation __ is available on github.io. A second copy of the stable documentation is hosted on read the docs for more details.

Development documentation is available for the latest changes in master.

==> Check out this Blog post for the reasoning and philosophy behind Runpandas, as well as a detailed tutorial with code examples.

==> Follow this Runpandas live book in Jupyter notebook format based on Jupyter Books.

Install

RunPandas depends on the following packages:

  • pandas
  • fitparse
  • stravalib
  • pydantic
  • pyaml
  • haversine

Runpandas was tested to work on *nix-like systems, including macOS.


Install latest release version via pip

$ pip install runpandas

Install latest release version via conda ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ conda install -c marcelcaraciolo runpandas

Install latest development version

$ pip install git+https://github.com/corriporai/runpandas.git

or

$ git clone https://github.com/corriporai/runpandas.git
$ python setup.py install

Examples

Install using pip and then import and use one of the tracking readers. This example loads a local file.tcx. From the data file, we obviously get time, altitude, distance, heart rate and geo position (lat/long).

# !pip install runpandas
import runpandas as rpd
activity = rpd.read_file('./sample.tcx')
activity.head(5)
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
alt dist hr lon lat
time
00:00:00 178.942627 0.000000 62.0 -79.093187 35.951880
00:00:01 178.942627 0.000000 62.0 -79.093184 35.951880
00:00:06 178.942627 1.106947 62.0 -79.093172 35.951868
00:00:12 177.500610 13.003035 62.0 -79.093228 35.951774
00:00:16 177.500610 22.405027 60.0 -79.093141 35.951732

The data frames that are returned by runpandas when loading files is similar for different file types. The dataframe in the above example is a subclass of the pandas.DataFrame and provides some additional features. Certain columns also return specific pandas.Series subclasses, which provides useful methods:

print (type(activity))
print(type(activity.alt))

<class 'runpandas.types.frame.Activity'> <class 'runpandas.types.columns.Altitude'>

For instance, if you want to get the base unit for the altitude alt data or the distance dist data:

print(activity.alt.base_unit)
print(activity.alt.sum())

m 65883.68151855901

print(activity.dist.base_unit)
print(activity.dist[-1])

m 4686.31103516

The Activity dataframe also contains special properties that presents some statistics from the workout such as elapsed time, mean heartrate, the moving time and the distance of workout in meters.

#total time elapsed for the activity
print(activity.ellapsed_time)
#distance of workout in meters
print(activity.distance)
#mean heartrate
print(activity.mean_heart_rate())

0 days 00:33:11 4686.31103516 156.65274151436032

Occasionally, some observations such as speed, distance and others must be calculated based on available data in the given activity. In runpandas there are special accessors (runpandas.acessors) that computes some of these metrics. We will compute the speed and the distance per position observations using the latitude and longitude for each record and calculate the haversine distance in meters and the speed in meters per second.

#compute the distance using haversine formula between two consecutive latitude, longitudes observations.
activity['distpos']  = activity.compute.distance()
activity['distpos'].head()

time 00:00:00 NaN 00:00:01 0.333146 00:00:06 1.678792 00:00:12 11.639901 00:00:16 9.183847 Name: distpos, dtype: float64

#compute the distance using haversine formula between two consecutive latitude, longitudes observations.
activity['speed']  = activity.compute.speed(from_distances=True)
activity['speed'].head()

time 00:00:00 NaN 00:00:01 0.333146 00:00:06 0.335758 00:00:12 1.939984 00:00:16 2.295962 Name: speed, dtype: float64

Popular running metrics are also available through the runpandas acessors such as gradient, pace, vertical speed , etc.

activity['vam'] = activity.compute.vertical_speed()
activity['vam'].head()

time 00:00:00 NaN 00:00:01 0.000000 00:00:06 0.000000 00:00:12 -0.240336 00:00:16 0.000000 Name: vam, dtype: float64

Sporadically, there will be a large time difference between consecutive observations in the same workout. This can happen when device is paused by the athlete or therere proprietary algorithms controlling the operating sampling rate of the device which can auto-pause when the device detects no significant change in position. In runpandas there is an algorithm that will attempt to calculate the moving time based on the GPS locations, distances, and speed of the activity.

To compute the moving time, there is a special acessor that detects the periods of inactivity and returns the moving series containing all the observations considered to be stopped.

activity_only_moving = activity.only_moving()
print(activity_only_moving['moving'].head())

time 00:00:00 False 00:00:01 False 00:00:06 False 00:00:12 True 00:00:16 True Name: moving, dtype: bool

Now we can compute the moving time, the time of how long the user were active.

activity_only_moving.moving_time

Timedelta('0 days 00:33:05')

Runpandas also provides a method summary for summarising the activity through common statistics. Such a session summary includes estimates of several metrics computed above with a single call.

activity_only_moving.summary()

Session Running: 26-12-2012 21:29:53 Total distance (meters) 4686.31 Total ellapsed time 0 days 00:33:11 Total moving time 0 days 00:33:05 Average speed (km/h) 8.47656 Average moving speed (km/h) 8.49853 Average pace (per 1 km) 0 days 00:07:04 Average pace moving (per 1 km) 0 days 00:07:03 Average cadence NaN Average moving cadence NaN Average heart rate 156.653 Average moving heart rate 157.4 Average temperature NaN dtype: object

Now, let’s play with the data. Let’s show distance vs as an example of what and how we can create visualizations. In this example, we will use the built in, matplotlib based plot function.

activity[['dist']].plot()

Matplotlib is building the font cache; this may take a moment.

<AxesSubplot:xlabel='time'>

image

And here is altitude versus time.

activity[['alt']].plot()

<AxesSubplot:xlabel='time'>

image

Finally, lest’s show the altitude vs distance profile. Here is a scatterplot that shows altitude vs distance as recorded.

activity.plot.scatter(x='dist', y='alt', c='DarkBlue')

<AxesSubplot:xlabel='dist', ylabel='alt'>

image

Finally, let’s watch a glimpse of the map route by plotting a 2d map using logintude vs latitude.

activity.plot(x='lon', y='lat')

<AxesSubplot:xlabel='lon'>

image

The runpandas package also comes with extra batteries, such as our runpandas.datasets package, which includes a range of example data for testing purposes. There is a dedicated repository with all the data available. An index of the data is kept here.

You can use the example data available:

example_fit = rpd.activity_examples(path='Garmin_Fenix_6S_Pro-Running.fit')
print(example_fit.summary)
print('Included metrics:', example_fit.included_data)

Synced from watch Garmin Fenix 6S

Included metrics: [<MetricsEnum.latitude: 'latitude'>, <MetricsEnum.longitude: 'longitude'>, <MetricsEnum.elevation: 'elevation'>, <MetricsEnum.heartrate: 'heartrate'>, <MetricsEnum.cadence: 'cadence'>, <MetricsEnum.distance: 'distance'>, <MetricsEnum.temperature: 'temperature'>]

rpd.read_file(example_fit.path).head()
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
enhanced_speed enhanced_altitude unknown_87 fractional_cadence lap session unknown_108 dist cad hr lon lat temp
time
00:00:00 0.000 254.0 0 0.0 0 0 NaN 0.00 0 101 13.843376 51.066280 8
00:00:01 0.000 254.0 0 0.0 0 0 NaN 0.00 0 101 13.843374 51.066274 8
00:00:10 1.698 254.0 0 0.0 0 1 2362.0 0.00 83 97 13.843176 51.066249 8
00:00:12 2.267 254.0 0 0.0 0 1 2362.0 3.95 84 99 13.843118 51.066250 8
00:00:21 2.127 254.6 0 0.5 0 1 2552.0 16.67 87 100 13.842940 51.066231 8

In case of you just only want to see all the activities in a specific file type , you can filter the runpandas.activities_examples, which returns a filter iterable that you can iterate over:

fit_examples = rpd.activity_examples(file_type=rpd.FileTypeEnum.FIT)
for example in fit_examples:
    #Download and play with the filtered examples
    print(example.path)

https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Fenix_6S_Pro-Running.fit https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Fenix2_running_with_hrm.fit https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Forerunner_910XT-Running.fit

Exploring sessions

The package runpandas provides utilities to import a group of activities data, and after careful processing, organises them into a MultiIndex Dataframe.

The pandas.MultiIndex allows you to have multiple columns acting as a row identifier and multiple rows acting as a header identifier. In our scenario we will have as first indentifier (index) the timestamp of the workout when it started, and as second indentifier the timedelta of the consecutive observations of the workout.

The MultiIndex Runpandas Activity Dataframe

The MultiIndex Runpandas Activity Dataframe

The MultiIndex dataframe result from the function runpandas.read_dir_aggregate, which takes as input the directory of tracking data files, and constructs using the read*() functions to build runpandas.Activity objects. Them, the result daframes are first sorted by the time stamps and are all combined into a single runpandas.Activity indexed by the two-level pandas.MultiIndex.

Let’s illustrate these examples by loading a bunch of 68 running activities of a female runner over the years of 2020 until 2021.

import warnings
warnings.filterwarnings('ignore')
import runpandas
session = runpandas.read_dir_aggregate(dirname='session/')
session
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
alt hr lon lat
start time
2020-08-30 09:08:51.012 00:00:00 NaN NaN -34.893609 -8.045055
00:00:01.091000 NaN NaN -34.893624 -8.045054
00:00:02.091000 NaN NaN -34.893641 -8.045061
00:00:03.098000 NaN NaN -34.893655 -8.045063
00:00:04.098000 NaN NaN -34.893655 -8.045065
... ... ... ... ... ...
2021-07-04 11:23:19.418 00:52:39.582000 0.050001 189.0 -34.894534 -8.046602
00:52:43.582000 NaN NaN -34.894465 -8.046533
00:52:44.582000 NaN NaN -34.894443 -8.046515
00:52:45.582000 NaN NaN -34.894429 -8.046494
00:52:49.582000 NaN 190.0 -34.894395 -8.046398

48794 rows × 4 columns

Now let’s see how many activities there are available for analysis. For this question, we also have an acessor runpandas.types.acessors.session._SessionAcessor that holds several methods for computing the basic running metrics across all the activities from this kind of frame and some summary statistics.

#count the number of activities in the session
print ('Total Activities:', session.session.count())

Total Activities: 68

We might compute the main running metrics (speed, pace, moving, etc) using the session acessors methods as like the ones available in the runpandas.types.metrics.MetricsAcessor . By the way, those methods are called inside each metric method, but applying in each of activities separatedely.

#In this example we compute the distance and the distance per position across all workouts
session = session.session.distance()
session
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
alt hr lon lat distpos dist
start time
2020-08-30 09:08:51.012 00:00:00 NaN NaN -34.893609 -8.045055 NaN NaN
00:00:01.091000 NaN NaN -34.893624 -8.045054 1.690587 1.690587
00:00:02.091000 NaN NaN -34.893641 -8.045061 2.095596 3.786183
00:00:03.098000 NaN NaN -34.893655 -8.045063 1.594298 5.380481
00:00:04.098000 NaN NaN -34.893655 -8.045065 0.163334 5.543815
... ... ... ... ... ... ... ...
2021-07-04 11:23:19.418 00:52:39.582000 0.050001 189.0 -34.894534 -8.046602 12.015437 8220.018885
00:52:43.582000 NaN NaN -34.894465 -8.046533 10.749779 8230.768664
00:52:44.582000 NaN NaN -34.894443 -8.046515 3.163638 8233.932302
00:52:45.582000 NaN NaN -34.894429 -8.046494 2.851535 8236.783837
00:52:49.582000 NaN 190.0 -34.894395 -8.046398 11.300740 8248.084577

48794 rows × 6 columns

#comput the speed for each activity
session = session.session.speed(from_distances=True)
#compute the pace for each activity
session = session.session.pace()
#compute the inactivity periods for each activity
session = session.session.only_moving()

After all the computation done, let’s going to the next step: the exploration and get some descriptive statistics.

After the loading and metrics computation for all the activities, now let’s look further the data and get the basic summaries about the session: time spent, total distance, mean speed and other insightful statistics in each running activity. For this task, we may accomplish it by calling the method runpandas.types.session._SessionAcessor.summarize . It will return a basic Dataframe including all the aggregated statistics per activity from the season frame.

summary = session.session.summarize()
summary
.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }
moving_time mean_speed max_speed mean_pace max_pace mean_moving_speed mean_moving_pace mean_cadence max_cadence mean_moving_cadence mean_heart_rate max_heart_rate mean_moving_heart_rate mean_temperature min_temperature max_temperature total_distance ellapsed_time
start
2020-07-03 09:50:53.162 00:25:29.838000 2.642051 4.879655 00:06:18 00:03:24 2.665008 00:06:15 NaN NaN NaN 178.819923 188.0 178.872587 NaN NaN NaN 4089.467333 00:25:47.838000
2020-07-05 09:33:20.999 00:05:04.999000 2.227637 6.998021 00:07:28 00:02:22 3.072098 00:05:25 NaN NaN NaN 168.345455 176.0 168.900000 NaN NaN NaN 980.162640 00:07:20.001000
2020-07-05 09:41:59.999 00:18:19 1.918949 6.563570 00:08:41 00:02:32 2.729788 00:06:06 NaN NaN NaN 173.894180 185.0 174.577143 NaN NaN NaN 3139.401118 00:27:16
2020-07-13 09:13:58.718 00:40:21.281000 2.509703 8.520387 00:06:38 00:01:57 2.573151 00:06:28 NaN NaN NaN 170.808176 185.0 170.795527 NaN NaN NaN 6282.491059 00:41:43.281000
2020-07-17 09:33:02.308 00:32:07.691000 2.643278 8.365431 00:06:18 00:01:59 2.643278 00:06:18 NaN NaN NaN 176.436242 186.0 176.436242 NaN NaN NaN 5095.423045 00:32:07.691000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2021-06-13 09:22:30.985 01:32:33.018000 2.612872 23.583956 00:06:22 00:00:42 2.810855 00:05:55 NaN NaN NaN 169.340812 183.0 169.655879 NaN NaN NaN 15706.017295 01:40:11.016000
2021-06-20 09:16:55.163 00:59:44.512000 2.492640 6.065895 00:06:41 00:02:44 2.749453 00:06:03 NaN NaN NaN 170.539809 190.0 171.231392 NaN NaN NaN 9965.168311 01:06:37.837000
2021-06-23 09:37:44.000 00:26:49.001000 2.501796 5.641343 00:06:39 00:02:57 2.568947 00:06:29 NaN NaN NaN 156.864865 171.0 156.957031 NaN NaN NaN 4165.492241 00:27:45.001000
2021-06-27 09:50:08.664 00:31:42.336000 2.646493 32.734124 00:06:17 00:00:30 2.661853 00:06:15 NaN NaN NaN 166.642857 176.0 166.721116 NaN NaN NaN 5074.217061 00:31:57.336000
2021-07-04 11:23:19.418 00:47:47.583000 2.602263 4.212320 00:06:24 00:03:57 2.856801 00:05:50 NaN NaN NaN 177.821862 192.0 177.956967 NaN NaN NaN 8248.084577 00:52:49.582000

68 rows × 18 columns

print('Session Interval:', (summary.index.to_series().max() - summary.index.to_series().min()).days, 'days')
print('Total Workouts:', len(summary), 'runnings')
print('Tota KM Distance:', summary['total_distance'].sum() / 1000)
print('Average Pace (all runs):', summary.mean_pace.mean())
print('Average Moving Pace (all runs):', summary.mean_moving_pace.mean())
print('Average KM Distance (all runs):', round(summary.total_distance.mean()/ 1000,2))

Session Interval: 366 days Total Workouts: 68 runnings Tota KM Distance: 491.77377537338896 Average Pace (all runs): 0 days 00:07:18.411764 Average Moving Pace (all runs): 0 days 00:06:02.147058 Average KM Distance (all runs): 7.23

At this point, I have the summary data to start some powerful visualization and analysis. At the charts below we illustrate her pace and distance evolution over time.

import matplotlib.pyplot as plt
import datetime

#let's convert the pace to float number in minutes
summary['mean_moving_pace_float'] = summary['mean_moving_pace'] / datetime.timedelta(minutes=1)
summary['pace_moving_all_mean'] = summary.mean_moving_pace.mean()
summary['pace_moving_all_mean_float'] = summary['pace_moving_all_mean'] / datetime.timedelta(minutes=1)

plt.subplots(figsize=(8, 5))

plt.plot(summary.index, summary.mean_moving_pace_float, color='silver')
plt.plot(summary.pace_moving_all_mean_float, color='purple', linestyle='dashed', label='average')
plt.title("Pace Evolution")
plt.xlabel("Runnings")
plt.ylabel("Pace")
plt.legend()

<matplotlib.legend.Legend at 0x7f82d8d83cd0>

image

plt.subplots(figsize=(8, 5))

summary['distance_all_mean'] = round(summary.total_distance.mean()/1000,2)

plt.plot(summary.index, summary.total_distance / 1000, color='silver')
plt.plot(summary.distance_all_mean, color='purple', linestyle='dashed', label='average')
plt.title("Distance Evolution")
plt.xlabel("Runs")
plt.ylabel("distance")
plt.legend()


plt.show()

image

Get in touch

I'm very interested in your experience with runpandas. Please drop me an note with any feedback you have.

Contributions welcome!

- Marcel Caraciolo

License

Runpandas is licensed under the MIT License. A copy of which is included in LICENSE.