Slides and sample code for Time Series Data Analysis, Visualization, Modeling and Forecasting with Python for Health and Self
Talk provides code for time series analysis modeling in general and then applies it to quantified self and fitness tracking data from Fitbit, Apple Watch or Oura.
Contents
How to understand human health across time or an individual self over a lifetime?
In this presentation and code, we look at time series analysis, a sub-field of machine learning and deep learning, using Python, and how it can be applied to tracking data like sleep and exercise from a FitBit, Apple Watch or Oura.
While most often applied to financial, sales, and weather data, time series analysis is also important when we think about health and self data, because before we can start modeling we need to be sure we adjust for any temporal components (like trends, seasonality, or serial correlation).
This talk and various code included provides a high-level yet actionable overview of time series analysis in Python. We look at tests for checking for temporal patterns (like Autocorrelation Plots and ADF) and time series techniques for normalizing or detrending time series data in certain situations. We examine classic time series statistical modeling using Box-Jenkins or ARIMA models, how to set parameters and see if it can be helpful for our health and self data. Finally, we look at Facebook's Prophet for forecasting ts data, including health and fitness data.
How to understand a self across time? Time series analysis allows us to look at non-stationary data like personal data, and translate that data into stationary data when needed. We can then look for patterns and meaning. It enables us to find relevant variables, plot recurring patterns and even make forecasts about trends in our health or productivity. Ultimately TS analysis becomes a powerful tool in the health analysis space for looking at how interventions (like a lifestyle change or treatment) have an impact on an individual (n=1) level.
If we want to go beyond generic advice and personalize our medicine and healthy habits, we need to consider the tempoeral component and time series data analysis can help.
- View the most up-to-date slides here. NOTE: Press "S" to view it in speaker mode and see additional talk notes and references.
- View Slides text in markdown.
- All Code: https://github.com/markwk/ts4health/tree/master/code
- Data Visualization with Python on Health and Self Time Series - Time Series Data Visualization using Fitbit or Apple Health Data for sleep and steps with primary focus being looking for trends, seasonality (esp day of week) and plotting autocorrelation function (i.e. lag).
- Also see: Time Series Data Visualization with Python - a standard example of time series data analysis and visualization using temperature data.
- Tests and Techniques for Time Series Analysis on Health and Self Data - Tests to check if there is a lag or other seasonality and various techniques to time adjust the data.
- Also see: Tests and Techniques for Time Series Data - more general purpose code and examples for testing and adjust time series
- Time Series Statistical Modeling for Health and Self - applying Box-Jenkins or ARIMA TS modeling to our fitbit and apple health data to see if it helps.
- Also see: Time Series Statistical Modeling Forecasting - more generic code on ARIMA model for time series modeling weather data.
- Prophet TS Analysis for Heath and Self - Simple example showing modeling of exercise and sleep data using Facebook's Prophet.
- Advanced TS Modeling with Facebook's Prophet - generic code for modeling of time series modeling using Prophet.
- TODO: Final Analyss and Modeling for Health and Self Data: Having looked at temporal patterns and made the necessary adjustments, what can we learn from our data and how might we model it now?
Data collection was done on a combination of wearables (Apple Watch, Fitbit, and Oura). Data aggregation was done using QS Ledger, an open source Python project for collecting and visualization of self-tracking data (Fitbit, Apple Health, Oura, etc). Each data set was then processed and aggregated into a standardized format. For additional information refer to QS Ledger or see my previous speech Python For Self-Trackers for a walkthrough.
Sample data is not being provided openly at this time. Please contact the author if you are in need of reference data or are interested in further data or analysis collaboration.
- Box & Jenkins. (2015). Time Series Analysis. John Wiley & Sons. (esp Ch 1-4)
- Pal. (2017). Practical Time Series Analysis. Packt Publishing Ltd. (esp Ch. 1-4)
- Downey, A. (2015). Think Stats (2nd Edition). O’Reilly Media, Inc. (esp Ch 12)
- Velicer. (2012). Time series analysis for psychological research. Handbook of Psychology, Second Edition. (Thorough introduction to ts for social scientists)
- Aigner (2011). Visualization of Time-Oriented Data. Springer Science & Business Media.
- Time Series Data Visualization with Python - Code and example of data visualization for times series
- A comprehensive beginner’s guide to create a Time Series Forecast - nice walkthrough of techniques for time series analysis and transformations
- ARIMA Model – Complete Guide to Time Series Forecasting in Python - good example of ARIMA modeling with step-by-step code from analysis and parameter setting in model to forecasting and model accuracy metrics.
- Time Series Analysis with Pandas - uses Open Power Systems Data with some good examples
- Pandas Time Series - pandas example using sunspots data
- Playing with time series data in python - focuses on energy trends data and more deep learning methods
- Working with Time Series from Python Data Science Handbook
Mark Koester is a tech entrepreneur, writer, and technologist. His current work is at the intersection of data technologies AND human health and optimization. As a data scientist and web and mobile app developer, he is the creator of PhotoStats.io (a photo tracking and analytics app), PodcastTracker for podcast listening logging, Biomarker Tracker (a health analytics service to better understand blood test results) and QS Ledger (the most extensive, open source, personal data collection and analysis tool). Former Regional Lead in Greater China at Techstars, a seed-stage accelerator, and program coordinator at Startup Next (powered by Google for Entrepreneurs). He run a boutique dev shop (Int3c.com) and is an active open source contributor. He regularly writes at www.markwk.com.