RunPandas - Python Package for handing running data from GPS-enabled tracking devices and applications.
RunPandas is a project to add support for data collected by GPS-enabled tracking devices, heart rate monitors data to [pandas](http://pandas.pydata.org) objects. It is a Python package that provides infrastructure for importing tracking data from such devices, enabling statistical and visual analysis for running enthusiasts and lovers. Its goal is to fill the gap between the routine collection of data and their manual analyses in Pandas and Python.
Stable documentation __ is available on github.io. A second copy of the stable documentation is hosted on read the docs for more details.
Development documentation is available for the latest changes in master.
==> Check out this Blog post for the reasoning and philosophy behind Runpandas, as well as a detailed tutorial with code examples.
==> Follow this Runpandas live book in Jupyter notebook format based on Jupyter Books.
RunPandas depends on the following packages:
pandas
fitparse
stravalib
pydantic
pyaml
haversine
Runpandas was tested to work on *nix-like systems, including macOS.
$ pip install runpandas
Install latest release version via conda ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
$ conda install -c marcelcaraciolo runpandas
$ pip install git+https://github.com/corriporai/runpandas.git
or
$ git clone https://github.com/corriporai/runpandas.git
$ python setup.py install
Install using pip
and then import and use one of the tracking readers. This example loads a local file.tcx. From the data file, we obviously get time, altitude, distance, heart rate and geo position (lat/long).
# !pip install runpandas
import runpandas as rpd
activity = rpd.read_file('./sample.tcx')
activity.head(5)
alt | dist | hr | lon | lat | |
---|---|---|---|---|---|
time | |||||
00:00:00 | 178.942627 | 0.000000 | 62.0 | -79.093187 | 35.951880 |
00:00:01 | 178.942627 | 0.000000 | 62.0 | -79.093184 | 35.951880 |
00:00:06 | 178.942627 | 1.106947 | 62.0 | -79.093172 | 35.951868 |
00:00:12 | 177.500610 | 13.003035 | 62.0 | -79.093228 | 35.951774 |
00:00:16 | 177.500610 | 22.405027 | 60.0 | -79.093141 | 35.951732 |
The data frames that are returned by runpandas when loading files is similar for different file types. The dataframe in the above example is a subclass of the pandas.DataFrame
and provides some additional features. Certain columns also return specific pandas.Series
subclasses, which provides useful methods:
print (type(activity))
print(type(activity.alt))
<class 'runpandas.types.frame.Activity'> <class 'runpandas.types.columns.Altitude'>
For instance, if you want to get the base unit for the altitude alt
data or the distance dist
data:
print(activity.alt.base_unit)
print(activity.alt.sum())
m 65883.68151855901
print(activity.dist.base_unit)
print(activity.dist[-1])
m 4686.31103516
The Activity
dataframe also contains special properties that presents some statistics from the workout such as elapsed time, mean heartrate, the moving time and the distance of workout in meters.
#total time elapsed for the activity
print(activity.ellapsed_time)
#distance of workout in meters
print(activity.distance)
#mean heartrate
print(activity.mean_heart_rate())
0 days 00:33:11 4686.31103516 156.65274151436032
Occasionally, some observations such as speed, distance and others must be calculated based on available data in the given activity. In runpandas there are special accessors (runpandas.acessors
) that computes some of these metrics. We will compute the speed
and the distance per position
observations using the latitude and longitude for each record and calculate the haversine distance in meters and the speed in meters per second.
#compute the distance using haversine formula between two consecutive latitude, longitudes observations.
activity['distpos'] = activity.compute.distance()
activity['distpos'].head()
time 00:00:00 NaN 00:00:01 0.333146 00:00:06 1.678792 00:00:12 11.639901 00:00:16 9.183847 Name: distpos, dtype: float64
#compute the distance using haversine formula between two consecutive latitude, longitudes observations.
activity['speed'] = activity.compute.speed(from_distances=True)
activity['speed'].head()
time 00:00:00 NaN 00:00:01 0.333146 00:00:06 0.335758 00:00:12 1.939984 00:00:16 2.295962 Name: speed, dtype: float64
Popular running metrics are also available through the runpandas acessors such as gradient, pace, vertical speed , etc.
activity['vam'] = activity.compute.vertical_speed()
activity['vam'].head()
time 00:00:00 NaN 00:00:01 0.000000 00:00:06 0.000000 00:00:12 -0.240336 00:00:16 0.000000 Name: vam, dtype: float64
Sporadically, there will be a large time difference between consecutive observations in the same workout. This can happen when device is paused by the athlete or therere proprietary algorithms controlling the operating sampling rate of the device which can auto-pause when the device detects no significant change in position. In runpandas there is an algorithm that will attempt to calculate the moving time based on the GPS locations, distances, and speed of the activity.
To compute the moving time, there is a special acessor that detects the periods of inactivity and returns the moving
series containing all the observations considered to be stopped.
activity_only_moving = activity.only_moving()
print(activity_only_moving['moving'].head())
time 00:00:00 False 00:00:01 False 00:00:06 False 00:00:12 True 00:00:16 True Name: moving, dtype: bool
Now we can compute the moving time, the time of how long the user were active.
activity_only_moving.moving_time
Timedelta('0 days 00:33:05')
Runpandas also provides a method summary
for summarising the activity through common statistics. Such a session summary includes estimates of several metrics computed above with a single call.
activity_only_moving.summary()
Session Running: 26-12-2012 21:29:53 Total distance (meters) 4686.31 Total ellapsed time 0 days 00:33:11 Total moving time 0 days 00:33:05 Average speed (km/h) 8.47656 Average moving speed (km/h) 8.49853 Average pace (per 1 km) 0 days 00:07:04 Average pace moving (per 1 km) 0 days 00:07:03 Average cadence NaN Average moving cadence NaN Average heart rate 156.653 Average moving heart rate 157.4 Average temperature NaN dtype: object
Now, let’s play with the data. Let’s show distance vs as an example of what and how we can create visualizations. In this example, we will use the built in, matplotlib based plot function.
activity[['dist']].plot()
Matplotlib is building the font cache; this may take a moment.
<AxesSubplot:xlabel='time'>
And here is altitude versus time.
activity[['alt']].plot()
<AxesSubplot:xlabel='time'>
Finally, lest’s show the altitude vs distance profile. Here is a scatterplot that shows altitude vs distance as recorded.
activity.plot.scatter(x='dist', y='alt', c='DarkBlue')
<AxesSubplot:xlabel='dist', ylabel='alt'>
Finally, let’s watch a glimpse of the map route by plotting a 2d map using logintude vs latitude.
activity.plot(x='lon', y='lat')
<AxesSubplot:xlabel='lon'>
The runpandas
package also comes with extra batteries, such as our runpandas.datasets
package, which includes a range of example data for testing purposes. There is a dedicated repository with all the data available. An index of the data is kept here.
You can use the example data available:
example_fit = rpd.activity_examples(path='Garmin_Fenix_6S_Pro-Running.fit')
print(example_fit.summary)
print('Included metrics:', example_fit.included_data)
Synced from watch Garmin Fenix 6S
Included metrics: [<MetricsEnum.latitude: 'latitude'>, <MetricsEnum.longitude: 'longitude'>, <MetricsEnum.elevation: 'elevation'>, <MetricsEnum.heartrate: 'heartrate'>, <MetricsEnum.cadence: 'cadence'>, <MetricsEnum.distance: 'distance'>, <MetricsEnum.temperature: 'temperature'>]
rpd.read_file(example_fit.path).head()
enhanced_speed | enhanced_altitude | unknown_87 | fractional_cadence | lap | session | unknown_108 | dist | cad | hr | lon | lat | temp | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
time | |||||||||||||
00:00:00 | 0.000 | 254.0 | 0 | 0.0 | 0 | 0 | NaN | 0.00 | 0 | 101 | 13.843376 | 51.066280 | 8 |
00:00:01 | 0.000 | 254.0 | 0 | 0.0 | 0 | 0 | NaN | 0.00 | 0 | 101 | 13.843374 | 51.066274 | 8 |
00:00:10 | 1.698 | 254.0 | 0 | 0.0 | 0 | 1 | 2362.0 | 0.00 | 83 | 97 | 13.843176 | 51.066249 | 8 |
00:00:12 | 2.267 | 254.0 | 0 | 0.0 | 0 | 1 | 2362.0 | 3.95 | 84 | 99 | 13.843118 | 51.066250 | 8 |
00:00:21 | 2.127 | 254.6 | 0 | 0.5 | 0 | 1 | 2552.0 | 16.67 | 87 | 100 | 13.842940 | 51.066231 | 8 |
In case of you just only want to see all the activities in a specific file type , you can filter the runpandas.activities_examples
, which returns a filter iterable that you can iterate over:
fit_examples = rpd.activity_examples(file_type=rpd.FileTypeEnum.FIT)
for example in fit_examples:
#Download and play with the filtered examples
print(example.path)
https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Fenix_6S_Pro-Running.fit https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Fenix2_running_with_hrm.fit https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Forerunner_910XT-Running.fit
The package runpandas
provides utilities to import a group of activities data, and after careful processing, organises them into a MultiIndex Dataframe.
The pandas.MultiIndex
allows you to have multiple columns acting as a row identifier and multiple rows acting as a header identifier. In our scenario we will have as first indentifier (index) the timestamp of the workout when it started, and as second indentifier the timedelta of the consecutive observations of the workout.
The MultiIndex dataframe result from the function runpandas.read_dir_aggregate
, which takes as input the directory of tracking data files, and constructs using the read*() functions to build runpandas.Activity
objects. Them, the result daframes are first sorted by the time stamps and are all combined into a single runpandas.Activity
indexed by the two-level pandas.MultiIndex
.
Let’s illustrate these examples by loading a bunch of 68 running activities of a female runner over the years of 2020 until 2021.
import warnings
warnings.filterwarnings('ignore')
import runpandas
session = runpandas.read_dir_aggregate(dirname='session/')
session
alt | hr | lon | lat | ||
---|---|---|---|---|---|
start | time | ||||
2020-08-30 09:08:51.012 | 00:00:00 | NaN | NaN | -34.893609 | -8.045055 |
00:00:01.091000 | NaN | NaN | -34.893624 | -8.045054 | |
00:00:02.091000 | NaN | NaN | -34.893641 | -8.045061 | |
00:00:03.098000 | NaN | NaN | -34.893655 | -8.045063 | |
00:00:04.098000 | NaN | NaN | -34.893655 | -8.045065 | |
... | ... | ... | ... | ... | ... |
2021-07-04 11:23:19.418 | 00:52:39.582000 | 0.050001 | 189.0 | -34.894534 | -8.046602 |
00:52:43.582000 | NaN | NaN | -34.894465 | -8.046533 | |
00:52:44.582000 | NaN | NaN | -34.894443 | -8.046515 | |
00:52:45.582000 | NaN | NaN | -34.894429 | -8.046494 | |
00:52:49.582000 | NaN | 190.0 | -34.894395 | -8.046398 |
48794 rows × 4 columns
Now let’s see how many activities there are available for analysis. For this question, we also have an acessor runpandas.types.acessors.session._SessionAcessor
that holds several methods for computing the basic running metrics across all the activities from this kind of frame and some summary statistics.
#count the number of activities in the session
print ('Total Activities:', session.session.count())
Total Activities: 68
We might compute the main running metrics (speed, pace, moving, etc) using the session acessors methods as like the ones available in the runpandas.types.metrics.MetricsAcessor
. By the way, those methods are called inside each metric method, but applying in each of activities separatedely.
#In this example we compute the distance and the distance per position across all workouts
session = session.session.distance()
session
alt | hr | lon | lat | distpos | dist | ||
---|---|---|---|---|---|---|---|
start | time | ||||||
2020-08-30 09:08:51.012 | 00:00:00 | NaN | NaN | -34.893609 | -8.045055 | NaN | NaN |
00:00:01.091000 | NaN | NaN | -34.893624 | -8.045054 | 1.690587 | 1.690587 | |
00:00:02.091000 | NaN | NaN | -34.893641 | -8.045061 | 2.095596 | 3.786183 | |
00:00:03.098000 | NaN | NaN | -34.893655 | -8.045063 | 1.594298 | 5.380481 | |
00:00:04.098000 | NaN | NaN | -34.893655 | -8.045065 | 0.163334 | 5.543815 | |
... | ... | ... | ... | ... | ... | ... | ... |
2021-07-04 11:23:19.418 | 00:52:39.582000 | 0.050001 | 189.0 | -34.894534 | -8.046602 | 12.015437 | 8220.018885 |
00:52:43.582000 | NaN | NaN | -34.894465 | -8.046533 | 10.749779 | 8230.768664 | |
00:52:44.582000 | NaN | NaN | -34.894443 | -8.046515 | 3.163638 | 8233.932302 | |
00:52:45.582000 | NaN | NaN | -34.894429 | -8.046494 | 2.851535 | 8236.783837 | |
00:52:49.582000 | NaN | 190.0 | -34.894395 | -8.046398 | 11.300740 | 8248.084577 |
48794 rows × 6 columns
#comput the speed for each activity
session = session.session.speed(from_distances=True)
#compute the pace for each activity
session = session.session.pace()
#compute the inactivity periods for each activity
session = session.session.only_moving()
After all the computation done, let’s going to the next step: the exploration and get some descriptive statistics.
After the loading and metrics computation for all the activities, now let’s look further the data and get the basic summaries about the session: time spent, total distance, mean speed and other insightful statistics in each running activity. For this task, we may accomplish it by calling the method runpandas.types.session._SessionAcessor.summarize
. It will return a basic Dataframe including all the aggregated statistics per activity from the season frame.
summary = session.session.summarize()
summary
moving_time | mean_speed | max_speed | mean_pace | max_pace | mean_moving_speed | mean_moving_pace | mean_cadence | max_cadence | mean_moving_cadence | mean_heart_rate | max_heart_rate | mean_moving_heart_rate | mean_temperature | min_temperature | max_temperature | total_distance | ellapsed_time | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
start | ||||||||||||||||||
2020-07-03 09:50:53.162 | 00:25:29.838000 | 2.642051 | 4.879655 | 00:06:18 | 00:03:24 | 2.665008 | 00:06:15 | NaN | NaN | NaN | 178.819923 | 188.0 | 178.872587 | NaN | NaN | NaN | 4089.467333 | 00:25:47.838000 |
2020-07-05 09:33:20.999 | 00:05:04.999000 | 2.227637 | 6.998021 | 00:07:28 | 00:02:22 | 3.072098 | 00:05:25 | NaN | NaN | NaN | 168.345455 | 176.0 | 168.900000 | NaN | NaN | NaN | 980.162640 | 00:07:20.001000 |
2020-07-05 09:41:59.999 | 00:18:19 | 1.918949 | 6.563570 | 00:08:41 | 00:02:32 | 2.729788 | 00:06:06 | NaN | NaN | NaN | 173.894180 | 185.0 | 174.577143 | NaN | NaN | NaN | 3139.401118 | 00:27:16 |
2020-07-13 09:13:58.718 | 00:40:21.281000 | 2.509703 | 8.520387 | 00:06:38 | 00:01:57 | 2.573151 | 00:06:28 | NaN | NaN | NaN | 170.808176 | 185.0 | 170.795527 | NaN | NaN | NaN | 6282.491059 | 00:41:43.281000 |
2020-07-17 09:33:02.308 | 00:32:07.691000 | 2.643278 | 8.365431 | 00:06:18 | 00:01:59 | 2.643278 | 00:06:18 | NaN | NaN | NaN | 176.436242 | 186.0 | 176.436242 | NaN | NaN | NaN | 5095.423045 | 00:32:07.691000 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2021-06-13 09:22:30.985 | 01:32:33.018000 | 2.612872 | 23.583956 | 00:06:22 | 00:00:42 | 2.810855 | 00:05:55 | NaN | NaN | NaN | 169.340812 | 183.0 | 169.655879 | NaN | NaN | NaN | 15706.017295 | 01:40:11.016000 |
2021-06-20 09:16:55.163 | 00:59:44.512000 | 2.492640 | 6.065895 | 00:06:41 | 00:02:44 | 2.749453 | 00:06:03 | NaN | NaN | NaN | 170.539809 | 190.0 | 171.231392 | NaN | NaN | NaN | 9965.168311 | 01:06:37.837000 |
2021-06-23 09:37:44.000 | 00:26:49.001000 | 2.501796 | 5.641343 | 00:06:39 | 00:02:57 | 2.568947 | 00:06:29 | NaN | NaN | NaN | 156.864865 | 171.0 | 156.957031 | NaN | NaN | NaN | 4165.492241 | 00:27:45.001000 |
2021-06-27 09:50:08.664 | 00:31:42.336000 | 2.646493 | 32.734124 | 00:06:17 | 00:00:30 | 2.661853 | 00:06:15 | NaN | NaN | NaN | 166.642857 | 176.0 | 166.721116 | NaN | NaN | NaN | 5074.217061 | 00:31:57.336000 |
2021-07-04 11:23:19.418 | 00:47:47.583000 | 2.602263 | 4.212320 | 00:06:24 | 00:03:57 | 2.856801 | 00:05:50 | NaN | NaN | NaN | 177.821862 | 192.0 | 177.956967 | NaN | NaN | NaN | 8248.084577 | 00:52:49.582000 |
68 rows × 18 columns
print('Session Interval:', (summary.index.to_series().max() - summary.index.to_series().min()).days, 'days')
print('Total Workouts:', len(summary), 'runnings')
print('Tota KM Distance:', summary['total_distance'].sum() / 1000)
print('Average Pace (all runs):', summary.mean_pace.mean())
print('Average Moving Pace (all runs):', summary.mean_moving_pace.mean())
print('Average KM Distance (all runs):', round(summary.total_distance.mean()/ 1000,2))
Session Interval: 366 days Total Workouts: 68 runnings Tota KM Distance: 491.77377537338896 Average Pace (all runs): 0 days 00:07:18.411764 Average Moving Pace (all runs): 0 days 00:06:02.147058 Average KM Distance (all runs): 7.23
At this point, I have the summary data to start some powerful visualization and analysis. At the charts below we illustrate her pace and distance evolution over time.
import matplotlib.pyplot as plt
import datetime
#let's convert the pace to float number in minutes
summary['mean_moving_pace_float'] = summary['mean_moving_pace'] / datetime.timedelta(minutes=1)
summary['pace_moving_all_mean'] = summary.mean_moving_pace.mean()
summary['pace_moving_all_mean_float'] = summary['pace_moving_all_mean'] / datetime.timedelta(minutes=1)
plt.subplots(figsize=(8, 5))
plt.plot(summary.index, summary.mean_moving_pace_float, color='silver')
plt.plot(summary.pace_moving_all_mean_float, color='purple', linestyle='dashed', label='average')
plt.title("Pace Evolution")
plt.xlabel("Runnings")
plt.ylabel("Pace")
plt.legend()
<matplotlib.legend.Legend at 0x7f82d8d83cd0>
plt.subplots(figsize=(8, 5))
summary['distance_all_mean'] = round(summary.total_distance.mean()/1000,2)
plt.plot(summary.index, summary.total_distance / 1000, color='silver')
plt.plot(summary.distance_all_mean, color='purple', linestyle='dashed', label='average')
plt.title("Distance Evolution")
plt.xlabel("Runs")
plt.ylabel("distance")
plt.legend()
plt.show()
- Report bugs, suggest features or view the source code [on GitHub](https://github.com/corriporai/runpandas).
I'm very interested in your experience with runpandas. Please drop me an note with any feedback you have.
Contributions welcome!
- Marcel Caraciolo
Runpandas is licensed under the MIT License. A copy of which is included in LICENSE.