A Python package for the analysis of biopsychological data.
With this package you have everything you need for analyzing biopsychological data, including:
- Data processing pipelines for various physiological signals (ECG, EEG, Respiration, Motion, ...).
- Algorithms and data processing pipelines for sleep/wake prediction and computation of sleep endpoints based on activity or IMU data.
- Functions to import and process data from sleep trackers (e.g., Withings Sleep Analyzer)
- Functions for processing and analysis of salivary biomarker data (cortisol, amylase).
- Implementation of various psychological and HCI-related questionnaires.
- Implementation of classes representing different psychological protocols (e.g., TSST, MIST, Cortisol Awakening Response Assessment, etc.)
- Functions for easily setting up statistical analysis pipelines.
- Functions for setting up and evaluating machine learning pipelines.
- Plotting wrappers optimized for displaying biopsychological data.
BioPsyKit
provides a whole ECG data processing pipeline, consisting of:
- Loading ECG data from:
- Generic
.csv
files - NilsPod binary (
.bin
) files (requiresNilsPodLib
) - Other sensor types (coming soon)
- Generic
- Splitting data into single study parts (based on time intervals) that will be analyzed separately
- Performing ECG processing, including:
- R peak detection (using
Neurokit
) - R peak outlier removal and interpolation
- HRV feature computation
- ECG-derived respiration (EDR) estimation for respiration rate and respiratory sinus arrhythmia (RSA) (experimental)
- Instantaneous heart rate resampling
- Computing aggregated results (e.g., mean and standard error) per study part
- R peak detection (using
- Creating plots for visualizing processing results
from biopsykit.signals.ecg import EcgProcessor
from biopsykit.example_data import get_ecg_example
ecg_data, sampling_rate = get_ecg_example()
ep = EcgProcessor(ecg_data, sampling_rate)
ep.ecg_process()
print(ep.ecg_result)
... more biosignals coming soon!
BioPsyKit
allows to process sleep data collected from IMU or activity sensors (e.g., Actigraphs). This includes:
- Detection of wear periods
- Detection of time spent in bed
- Detection of sleep and wake phases
- Computation of sleep endpoints (e.g., sleep and wake onset, net sleep duration wake after sleep onset, etc.)
import biopsykit as bp
from biopsykit.example_data import get_sleep_imu_example
imu_data, sampling_rate = get_sleep_imu_example()
sleep_results = bp.sleep.sleep_processing_pipeline.predict_pipeline_acceleration(imu_data, sampling_rate)
sleep_endpoints = sleep_results["sleep_endpoints"]
print(sleep_endpoints)
BioPsyKit
provides several methods for the analysis of salivary biomarkers (e.g. cortisol and amylase), such as:
- Import data from Excel and csv files into a standardized format
- Compute standard features (maximum increase, slope, area-under-the-curve, mean, standard deviation, ...)
import biopsykit as bp
from biopsykit.example_data import get_saliva_example
saliva_data = get_saliva_example(sample_times=[-20, 0, 10, 20, 30, 40, 50])
max_inc = bp.saliva.max_increase(saliva_data)
# remove the first saliva sample (t=-20) from computing the AUC
auc = bp.saliva.auc(saliva_data, remove_s0=True)
print(max_inc)
print(auc)
BioPsyKit
implements various established psychological (state and trait) questionnaires, such as:
- Perceived Stress Scale (PSS)
- Positive and Negative Affect Schedule (PANAS)
- Self-Compassion Scale (SCS)
- Big Five Inventory (BFI)
- State Trait Depression and Anxiety Questionnaire (STADI)
- Trier Inventory for Chronic Stress (TICS)
- Primary Appraisal Secondary Appraisal Scale (PASA)
- ...
import biopsykit as bp
from biopsykit.example_data import get_questionnaire_example
data = get_questionnaire_example()
pss_data = data.filter(like="PSS")
pss_result = bp.questionnaires.pss(pss_data)
print(pss_result)
import biopsykit as bp
print(bp.questionnaires.utils.get_supported_questionnaires())
BioPsyKit
implements methods for easy handling and analysis of data recorded with several established psychological
protocols, such as:
- Montreal Imaging Stress Task (MIST)
- Trier Social Stress Test (TSST)
- Cortisol Awakening Response Assessment (CAR)
- ...
from biopsykit.protocols import TSST
from biopsykit.example_data import get_saliva_example
from biopsykit.example_data import get_hr_subject_data_dict_example
# specify TSST structure and the durations of the single phases
structure = {
"Pre": None,
"TSST": {
"Preparation": 300,
"Talk": 300,
"Math": 300
},
"Post": None
}
tsst = TSST(name="TSST", structure=structure)
saliva_data = get_saliva_example(sample_times=[-20, 0, 10, 20, 30, 40, 50])
hr_subject_data_dict = get_hr_subject_data_dict_example()
# add saliva data collected during the whole TSST procedure
tsst.add_saliva_data(saliva_data, saliva_type="cortisol")
# add heart rate data collected during the "TSST" study part
tsst.add_hr_data(hr_subject_data_dict, study_part="TSST")
# compute heart rate results: normalize ECG data relative to "Preparation" phase; afterwards, use data from the
# "Talk" and "Math" phases and compute the average heart rate for each subject and study phase, respectively
tsst.compute_hr_results(
result_id="hr_mean",
study_part="TSST",
normalize_to=True,
select_phases=True,
mean_per_subject=True,
params={
"normalize_to": "Preparation",
"select_phases": ["Talk", "Math"]
}
)
BioPsyKit
implements methods for simplified statistical analysis of biopsychological data by offering an
object-oriented interface for setting up statistical analysis pipelines, displaying the results, and adding
statistical significance brackets to plots.
import matplotlib.pyplot as plt
from biopsykit.stats import StatsPipeline
from biopsykit.plotting import multi_feature_boxplot
from biopsykit.example_data import get_stats_example
data = get_stats_example()
# configure statistical analysis pipeline which consists of checking for normal distribution and performing paired
# t-tests (within-variable: time) on each questionnaire subscale separately (grouping data by subscale).
pipeline = StatsPipeline(
steps=[("prep", "normality"), ("test", "pairwise_ttests")],
params={"dv": "PANAS", "groupby": "subscale", "subject": "subject", "within": "time"}
)
# apply statistics pipeline on data
pipeline.apply(data)
# plot data and add statistical significance brackets from statistical analysis pipeline
fig, axs = plt.subplots(ncols=3)
features = ["NegativeAffect", "PositiveAffect", "Total"]
# generate statistical significance brackets
box_pairs, pvalues = pipeline.sig_brackets(
"test", stats_effect_type="within", plot_type="single", x="time", features=features, subplots=True
)
# plot data
multi_feature_boxplot(
data=data, x="time", y="PANAS", features=features, group="subscale", order=["pre", "post"],
stats_kwargs={"box_pairs": box_pairs, "pvalues": pvalues}, ax=axs
)
BioPsyKit
implements methods for simplified and systematic evaluation of different machine learning pipelines.
# Utils
from sklearn.datasets import load_breast_cancer
# Preprocessing & Feature Selection
from sklearn.feature_selection import SelectKBest
from sklearn.preprocessing import MinMaxScaler, StandardScaler
# Classification
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
# Cross-Validation
from sklearn.model_selection import KFold
from biopsykit.classification.model_selection import SklearnPipelinePermuter
# load example dataset
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target
# specify estimator combinations
model_dict = {
"scaler": {
"StandardScaler": StandardScaler(),
"MinMaxScaler": MinMaxScaler()
},
"reduce_dim": {
"SelectKBest": SelectKBest(),
},
"clf" : {
"KNeighborsClassifier": KNeighborsClassifier(),
"DecisionTreeClassifier": DecisionTreeClassifier(),
}
}
# specify hyperparameter for grid search
params_dict = {
"StandardScaler": None,
"MinMaxScaler": None,
"SelectKBest": { "k": [2, 4, "all"] },
"KNeighborsClassifier": { "n_neighbors": [2, 4], "weights": ["uniform", "distance"] },
"DecisionTreeClassifier": {"criterion": ['gini', 'entropy'], "max_depth": [2, 4] },
}
pipeline_permuter = SklearnPipelinePermuter(model_dict, params_dict)
pipeline_permuter.fit(X, y, outer_cv=KFold(5), inner_cv=KFold(5))
# print mean performance scores for each pipeline and parameter combinations, averaged over all outer CV folds
print(pipeline_permuter.mean_pipeline_score_results())
# print overall best-performing pipeline and the performances over all outer CV folds
print(pipeline_permuter.best_pipeline())
# print summary of all relevant metrics for the best pipeline for each evaluated pipeline combination
print(pipeline_permuter.metric_summary())
BioPsyKit
requires Python >=3.8. First, install a compatible version of Python. Then install BioPsyKit
via pip.
Installation from PyPi:
pip install biopsykit
Installation from PyPi with extras
(e.g., jupyter
to directly install all required dependencies for the use with Jupyter Lab):
pip install "biopsykit[jupyter]"
Installation from local repository copy:
git clone https://github.com/mad-lab-fau/BioPsyKit.git
cd BioPsyKit
pip install .
If you are a developer and want to contribute to BioPsyKit
you can install an editable version of the package from
a local copy of the repository.
BioPsyKit uses poetry to manage dependencies and packaging. Once you installed poetry, run the following commands to clone the repository, initialize a virtual env and install all development dependencies:
git clone https://github.com/mad-lab-fau/BioPsyKit.git
cd BioPsyKit
poetry install
git clone https://github.com/mad-lab-fau/BioPsyKit.git
cd BioPsyKit
poetry install -E mne -E jupyter
To run any of the tools required for the development workflow, use the poe
commands of the
poethepoet task runner:
$ poe
docs Build the html docs using Sphinx.
format Reformat all files using black.
format_check Check, but not change, formatting using black.
lint Lint all files with Prospector.
test Run Pytest with coverage.
update_version Bump the version in pyproject.toml and biopsykit.__init__ .
register_ipykernel Register a new IPython kernel named `biopsykit` linked to the virtual environment.
remove_ipykernel Remove the associated IPython kernel.
-
The
poe
commands are only available if you are in the virtual environment associated with this project. You can either activate the virtual environment manually (e.g.,source .venv/bin/activate
) or use thepoetry shell
command to spawn a new shell with the virtual environment activated. -
In order to use jupyter notebooks with the project you need to register a new IPython kernel associated with the venv of the project (
poe register_ipykernel
- see below). When creating a notebook, make to sure to select this kernel (top right corner of the notebook). -
In order to build the documentation, you need to additionally install pandoc.
See the Contributing Guidelines for further information.
See the Examples Gallery for example on how to use BioPsyKit.
If you use BioPsyKit
in your work, please report the version you used in the text. Additionally, please also cite the corresponding paper:
Richer et al., (2021). BioPsyKit: A Python package for the analysis of biopsychological data. Journal of Open Source Software, 6(66), 3702, https://doi.org/10.21105/joss.03702
If you use a specific algorithm please also to make sure you cite the original paper of the algorithm! We recommend the following citation style:
We used the algorithm proposed by Author et al. [paper-citation], implemented by the BioPsykit package [biopsykit-citation].