RunPandas - Python Package for handing running data from GPS-enabled tracking devices and applications.

Introduction

RunPandas is a project to add support for data collected by GPS-enabled tracking devices, heart rate monitors data to [pandas](http://pandas.pydata.org) objects. It is a Python package that provides infrastructure for importing tracking data from such devices, enabling statistical and visual analysis for running enthusiasts and lovers. Its goal is to fill the gap between the routine collection of data and their manual analyses in Pandas and Python.

Documentation

Stable documentation __ is available on github.io. A second copy of the stable documentation is hosted on read the docs for more details.

Development documentation is available for the latest changes in master.

==> Check out this Blog post for the reasoning and philosophy behind Runpandas, as well as a detailed tutorial with code examples.

==> Follow this Runpandas live book in Jupyter notebook format based on Jupyter Books.

Install

RunPandas depends on the following packages:

pandas
fitparse
stravalib
pydantic
pyaml
haversine

Runpandas was tested to work on *nix-like systems, including macOS.

Install latest release version via pip

$ pip install runpandas

Install latest release version via conda ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

$ conda install -c marcelcaraciolo runpandas

Install latest development version

$ pip install git+https://github.com/corriporai/runpandas.git

$ git clone https://github.com/corriporai/runpandas.git
$ python setup.py install

Examples

Install using pip and then import and use one of the tracking readers. This example loads a local file.tcx. From the data file, we obviously get time, altitude, distance, heart rate and geo position (lat/long).

# !pip install runpandas
import runpandas as rpd
activity = rpd.read_file('./sample.tcx')

activity.head(5)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	alt	dist	hr	lon	lat
time
00:00:00	178.942627	0.000000	62.0	-79.093187	35.951880
00:00:01	178.942627	0.000000	62.0	-79.093184	35.951880
00:00:06	178.942627	1.106947	62.0	-79.093172	35.951868
00:00:12	177.500610	13.003035	62.0	-79.093228	35.951774
00:00:16	177.500610	22.405027	60.0	-79.093141	35.951732

The data frames that are returned by runpandas when loading files is similar for different file types. The dataframe in the above example is a subclass of the pandas.DataFrame and provides some additional features. Certain columns also return specific pandas.Series subclasses, which provides useful methods:

print (type(activity))
print(type(activity.alt))

For instance, if you want to get the base unit for the altitude alt data or the distance dist data:

print(activity.alt.base_unit)
print(activity.alt.sum())

m 65883.68151855901

print(activity.dist.base_unit)
print(activity.dist[-1])

m 4686.31103516

The Activity dataframe also contains special properties that presents some statistics from the workout such as elapsed time, mean heartrate, the moving time and the distance of workout in meters.

#total time elapsed for the activity
print(activity.ellapsed_time)
#distance of workout in meters
print(activity.distance)
#mean heartrate
print(activity.mean_heart_rate())

0 days 00:33:11 4686.31103516 156.65274151436032

Occasionally, some observations such as speed, distance and others must be calculated based on available data in the given activity. In runpandas there are special accessors (runpandas.acessors) that computes some of these metrics. We will compute the speed and the distance per position observations using the latitude and longitude for each record and calculate the haversine distance in meters and the speed in meters per second.

#compute the distance using haversine formula between two consecutive latitude, longitudes observations.
activity['distpos']  = activity.compute.distance()
activity['distpos'].head()

time 00:00:00 NaN 00:00:01 0.333146 00:00:06 1.678792 00:00:12 11.639901 00:00:16 9.183847 Name: distpos, dtype: float64

#compute the distance using haversine formula between two consecutive latitude, longitudes observations.
activity['speed']  = activity.compute.speed(from_distances=True)
activity['speed'].head()

time 00:00:00 NaN 00:00:01 0.333146 00:00:06 0.335758 00:00:12 1.939984 00:00:16 2.295962 Name: speed, dtype: float64

Popular running metrics are also available through the runpandas acessors such as gradient, pace, vertical speed , etc.

activity['vam'] = activity.compute.vertical_speed()
activity['vam'].head()

time 00:00:00 NaN 00:00:01 0.000000 00:00:06 0.000000 00:00:12 -0.240336 00:00:16 0.000000 Name: vam, dtype: float64

Sporadically, there will be a large time difference between consecutive observations in the same workout. This can happen when device is paused by the athlete or therere proprietary algorithms controlling the operating sampling rate of the device which can auto-pause when the device detects no significant change in position. In runpandas there is an algorithm that will attempt to calculate the moving time based on the GPS locations, distances, and speed of the activity.

To compute the moving time, there is a special acessor that detects the periods of inactivity and returns the moving series containing all the observations considered to be stopped.

activity_only_moving = activity.only_moving()
print(activity_only_moving['moving'].head())

time 00:00:00 False 00:00:01 False 00:00:06 False 00:00:12 True 00:00:16 True Name: moving, dtype: bool

Now we can compute the moving time, the time of how long the user were active.

activity_only_moving.moving_time

Timedelta('0 days 00:33:05')

Runpandas also provides a method summary for summarising the activity through common statistics. Such a session summary includes estimates of several metrics computed above with a single call.

activity_only_moving.summary()

Session Running: 26-12-2012 21:29:53 Total distance (meters) 4686.31 Total ellapsed time 0 days 00:33:11 Total moving time 0 days 00:33:05 Average speed (km/h) 8.47656 Average moving speed (km/h) 8.49853 Average pace (per 1 km) 0 days 00:07:04 Average pace moving (per 1 km) 0 days 00:07:03 Average cadence NaN Average moving cadence NaN Average heart rate 156.653 Average moving heart rate 157.4 Average temperature NaN dtype: object

Now, let’s play with the data. Let’s show distance vs as an example of what and how we can create visualizations. In this example, we will use the built in, matplotlib based plot function.

activity[['dist']].plot()

Matplotlib is building the font cache; this may take a moment.

<AxesSubplot:xlabel='time'>

And here is altitude versus time.

activity[['alt']].plot()

<AxesSubplot:xlabel='time'>

Finally, lest’s show the altitude vs distance profile. Here is a scatterplot that shows altitude vs distance as recorded.

activity.plot.scatter(x='dist', y='alt', c='DarkBlue')

<AxesSubplot:xlabel='dist', ylabel='alt'>

Finally, let’s watch a glimpse of the map route by plotting a 2d map using logintude vs latitude.

activity.plot(x='lon', y='lat')

<AxesSubplot:xlabel='lon'>

The runpandas package also comes with extra batteries, such as our runpandas.datasets package, which includes a range of example data for testing purposes. There is a dedicated repository with all the data available. An index of the data is kept here.

You can use the example data available:

example_fit = rpd.activity_examples(path='Garmin_Fenix_6S_Pro-Running.fit')
print(example_fit.summary)
print('Included metrics:', example_fit.included_data)

Synced from watch Garmin Fenix 6S

Included metrics: [<MetricsEnum.latitude: 'latitude'>, <MetricsEnum.longitude: 'longitude'>, <MetricsEnum.elevation: 'elevation'>, <MetricsEnum.heartrate: 'heartrate'>, <MetricsEnum.cadence: 'cadence'>, <MetricsEnum.distance: 'distance'>, <MetricsEnum.temperature: 'temperature'>]

rpd.read_file(example_fit.path).head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	enhanced_speed	enhanced_altitude	unknown_87	fractional_cadence	lap	session	unknown_108	dist	cad	hr	lon	lat	temp
time
00:00:00	0.000	254.0	0	0.0	0	0	NaN	0.00	0	101	13.843376	51.066280	8
00:00:01	0.000	254.0	0	0.0	0	0	NaN	0.00	0	101	13.843374	51.066274	8
00:00:10	1.698	254.0	0	0.0	0	1	2362.0	0.00	83	97	13.843176	51.066249	8
00:00:12	2.267	254.0	0	0.0	0	1	2362.0	3.95	84	99	13.843118	51.066250	8
00:00:21	2.127	254.6	0	0.5	0	1	2552.0	16.67	87	100	13.842940	51.066231	8

In case of you just only want to see all the activities in a specific file type , you can filter the runpandas.activities_examples, which returns a filter iterable that you can iterate over:

fit_examples = rpd.activity_examples(file_type=rpd.FileTypeEnum.FIT)
for example in fit_examples:
    #Download and play with the filtered examples
    print(example.path)

https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Fenix_6S_Pro-Running.fit https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Fenix2_running_with_hrm.fit https://raw.githubusercontent.com/corriporai/runpandas-data/master/activities/Garmin_Forerunner_910XT-Running.fit

Exploring sessions

The package runpandas provides utilities to import a group of activities data, and after careful processing, organises them into a MultiIndex Dataframe.

The pandas.MultiIndex allows you to have multiple columns acting as a row identifier and multiple rows acting as a header identifier. In our scenario we will have as first indentifier (index) the timestamp of the workout when it started, and as second indentifier the timedelta of the consecutive observations of the workout.

The MultiIndex Runpandas Activity Dataframe

The MultiIndex dataframe result from the function runpandas.read_dir_aggregate, which takes as input the directory of tracking data files, and constructs using the read*() functions to build runpandas.Activity objects. Them, the result daframes are first sorted by the time stamps and are all combined into a single runpandas.Activity indexed by the two-level pandas.MultiIndex.

Let’s illustrate these examples by loading a bunch of 68 running activities of a female runner over the years of 2020 until 2021.

import warnings
warnings.filterwarnings('ignore')

import runpandas
session = runpandas.read_dir_aggregate(dirname='session/')

session

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

		alt	hr	lon	lat
start	time
2020-08-30 09:08:51.012	00:00:00	NaN	NaN	-34.893609	-8.045055
	00:00:01.091000	NaN	NaN	-34.893624	-8.045054
	00:00:02.091000	NaN	NaN	-34.893641	-8.045061
	00:00:03.098000	NaN	NaN	-34.893655	-8.045063
	00:00:04.098000	NaN	NaN	-34.893655	-8.045065
...	...	...	...	...	...
2021-07-04 11:23:19.418	00:52:39.582000	0.050001	189.0	-34.894534	-8.046602
	00:52:43.582000	NaN	NaN	-34.894465	-8.046533
	00:52:44.582000	NaN	NaN	-34.894443	-8.046515
	00:52:45.582000	NaN	NaN	-34.894429	-8.046494
	00:52:49.582000	NaN	190.0	-34.894395	-8.046398

48794 rows × 4 columns

Now let’s see how many activities there are available for analysis. For this question, we also have an acessor runpandas.types.acessors.session._SessionAcessor that holds several methods for computing the basic running metrics across all the activities from this kind of frame and some summary statistics.

#count the number of activities in the session
print ('Total Activities:', session.session.count())

Total Activities: 68

We might compute the main running metrics (speed, pace, moving, etc) using the session acessors methods as like the ones available in the runpandas.types.metrics.MetricsAcessor . By the way, those methods are called inside each metric method, but applying in each of activities separatedely.

#In this example we compute the distance and the distance per position across all workouts
session = session.session.distance()
session

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

		alt	hr	lon	lat	distpos	dist
start	time
2020-08-30 09:08:51.012	00:00:00	NaN	NaN	-34.893609	-8.045055	NaN	NaN
	00:00:01.091000	NaN	NaN	-34.893624	-8.045054	1.690587	1.690587
	00:00:02.091000	NaN	NaN	-34.893641	-8.045061	2.095596	3.786183
	00:00:03.098000	NaN	NaN	-34.893655	-8.045063	1.594298	5.380481
	00:00:04.098000	NaN	NaN	-34.893655	-8.045065	0.163334	5.543815
...	...	...	...	...	...	...	...
2021-07-04 11:23:19.418	00:52:39.582000	0.050001	189.0	-34.894534	-8.046602	12.015437	8220.018885
	00:52:43.582000	NaN	NaN	-34.894465	-8.046533	10.749779	8230.768664
	00:52:44.582000	NaN	NaN	-34.894443	-8.046515	3.163638	8233.932302
	00:52:45.582000	NaN	NaN	-34.894429	-8.046494	2.851535	8236.783837
	00:52:49.582000	NaN	190.0	-34.894395	-8.046398	11.300740	8248.084577

48794 rows × 6 columns

#comput the speed for each activity
session = session.session.speed(from_distances=True)
#compute the pace for each activity
session = session.session.pace()
#compute the inactivity periods for each activity
session = session.session.only_moving()

After all the computation done, let’s going to the next step: the exploration and get some descriptive statistics.

After the loading and metrics computation for all the activities, now let’s look further the data and get the basic summaries about the session: time spent, total distance, mean speed and other insightful statistics in each running activity. For this task, we may accomplish it by calling the method runpandas.types.session._SessionAcessor.summarize . It will return a basic Dataframe including all the aggregated statistics per activity from the season frame.

summary = session.session.summarize()
summary

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

	moving_time	mean_speed	max_speed	mean_pace	max_pace	mean_moving_speed	mean_moving_pace	mean_cadence	max_cadence	mean_moving_cadence	mean_heart_rate	max_heart_rate	mean_moving_heart_rate	mean_temperature	min_temperature	max_temperature	total_distance	ellapsed_time
start
2020-07-03 09:50:53.162	00:25:29.838000	2.642051	4.879655	00:06:18	00:03:24	2.665008	00:06:15	NaN	NaN	NaN	178.819923	188.0	178.872587	NaN	NaN	NaN	4089.467333	00:25:47.838000
2020-07-05 09:33:20.999	00:05:04.999000	2.227637	6.998021	00:07:28	00:02:22	3.072098	00:05:25	NaN	NaN	NaN	168.345455	176.0	168.900000	NaN	NaN	NaN	980.162640	00:07:20.001000
2020-07-05 09:41:59.999	00:18:19	1.918949	6.563570	00:08:41	00:02:32	2.729788	00:06:06	NaN	NaN	NaN	173.894180	185.0	174.577143	NaN	NaN	NaN	3139.401118	00:27:16
2020-07-13 09:13:58.718	00:40:21.281000	2.509703	8.520387	00:06:38	00:01:57	2.573151	00:06:28	NaN	NaN	NaN	170.808176	185.0	170.795527	NaN	NaN	NaN	6282.491059	00:41:43.281000
2020-07-17 09:33:02.308	00:32:07.691000	2.643278	8.365431	00:06:18	00:01:59	2.643278	00:06:18	NaN	NaN	NaN	176.436242	186.0	176.436242	NaN	NaN	NaN	5095.423045	00:32:07.691000
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
2021-06-13 09:22:30.985	01:32:33.018000	2.612872	23.583956	00:06:22	00:00:42	2.810855	00:05:55	NaN	NaN	NaN	169.340812	183.0	169.655879	NaN	NaN	NaN	15706.017295	01:40:11.016000
2021-06-20 09:16:55.163	00:59:44.512000	2.492640	6.065895	00:06:41	00:02:44	2.749453	00:06:03	NaN	NaN	NaN	170.539809	190.0	171.231392	NaN	NaN	NaN	9965.168311	01:06:37.837000
2021-06-23 09:37:44.000	00:26:49.001000	2.501796	5.641343	00:06:39	00:02:57	2.568947	00:06:29	NaN	NaN	NaN	156.864865	171.0	156.957031	NaN	NaN	NaN	4165.492241	00:27:45.001000
2021-06-27 09:50:08.664	00:31:42.336000	2.646493	32.734124	00:06:17	00:00:30	2.661853	00:06:15	NaN	NaN	NaN	166.642857	176.0	166.721116	NaN	NaN	NaN	5074.217061	00:31:57.336000
2021-07-04 11:23:19.418	00:47:47.583000	2.602263	4.212320	00:06:24	00:03:57	2.856801	00:05:50	NaN	NaN	NaN	177.821862	192.0	177.956967	NaN	NaN	NaN	8248.084577	00:52:49.582000

68 rows × 18 columns

print('Session Interval:', (summary.index.to_series().max() - summary.index.to_series().min()).days, 'days')
print('Total Workouts:', len(summary), 'runnings')
print('Tota KM Distance:', summary['total_distance'].sum() / 1000)
print('Average Pace (all runs):', summary.mean_pace.mean())
print('Average Moving Pace (all runs):', summary.mean_moving_pace.mean())
print('Average KM Distance (all runs):', round(summary.total_distance.mean()/ 1000,2))

Session Interval: 366 days Total Workouts: 68 runnings Tota KM Distance: 491.77377537338896 Average Pace (all runs): 0 days 00:07:18.411764 Average Moving Pace (all runs): 0 days 00:06:02.147058 Average KM Distance (all runs): 7.23

At this point, I have the summary data to start some powerful visualization and analysis. At the charts below we illustrate her pace and distance evolution over time.

import matplotlib.pyplot as plt
import datetime

#let's convert the pace to float number in minutes
summary['mean_moving_pace_float'] = summary['mean_moving_pace'] / datetime.timedelta(minutes=1)
summary['pace_moving_all_mean'] = summary.mean_moving_pace.mean()
summary['pace_moving_all_mean_float'] = summary['pace_moving_all_mean'] / datetime.timedelta(minutes=1)

plt.subplots(figsize=(8, 5))

plt.plot(summary.index, summary.mean_moving_pace_float, color='silver')
plt.plot(summary.pace_moving_all_mean_float, color='purple', linestyle='dashed', label='average')
plt.title("Pace Evolution")
plt.xlabel("Runnings")
plt.ylabel("Pace")
plt.legend()

<matplotlib.legend.Legend at 0x7f82d8d83cd0>

plt.subplots(figsize=(8, 5))

summary['distance_all_mean'] = round(summary.total_distance.mean()/1000,2)

plt.plot(summary.index, summary.total_distance / 1000, color='silver')
plt.plot(summary.distance_all_mean, color='purple', linestyle='dashed', label='average')
plt.title("Distance Evolution")
plt.xlabel("Runs")
plt.ylabel("distance")
plt.legend()


plt.show()

Get in touch

Report bugs, suggest features or view the source code [on GitHub](https://github.com/corriporai/runpandas).

I'm very interested in your experience with runpandas. Please drop me an note with any feedback you have.

Contributions welcome!

- Marcel Caraciolo

License

Runpandas is licensed under the MIT License. A copy of which is included in LICENSE.

bitner/runpandas

RunPandas - Python Package for handing running data from GPS-enabled tracking devices and applications.

Introduction

Documentation

Install

Install latest release version via pip

Install latest development version

Examples

Exploring sessions

Get in touch

License