health-reference-design-public-data

PPG-DaLiA Data Processing Pipeline see our Health Reference Design Documentation for more information.

This repository is a reference design for an end-to-end machine learning workflow using Edge Impulse to process the PPG-DaLiA dataset, and assumes that the data is available and the transformation blocks have been set up. It demonstrates how to:

Process raw sensor data (accelerometer and PPG) from multiple subjects.
Extract and attach metadata to each subject's data.
Combine all processed data into a single dataset suitable for machine learning tasks like heart rate variability (HRV) analysis and activity classification.

This reference design includes:

Transformation Blocks:

DataProcessor: Processes raw data files for each subject.
MetadataGenerator: Extracts metadata from questionnaire files and attaches it to the data.
DataCombiner: Combines all processed data into a single dataset.
Edge Impulse Pipeline: Automates the data processing workflow by chaining the transformation blocks.

Overview
Prerequisites
Repository Structure
Setting Up the Repository
Transformation Blocks
Creating the Pipeline in Edge Impulse
Running the Pipeline
Using the Combined Dataset
Contributing
License

Overview

The PPG-DaLiA dataset consists of data collected from 15 subjects performing various activities while wearing a wristband equipped with sensors. The dataset includes:

Accelerometer data (ACC.csv)
Photoplethysmography (PPG) data (BVP.csv)
Heart rate data (HR.csv)
Electrodermal activity (EDA.csv)
Skin temperature (TEMP.csv)
Activity labels (S*_activity.csv)
Subject metadata (S*_quest.csv)

This repository provides a workflow to process this data using Edge Impulse transformation blocks, culminating in a combined dataset ready for machine learning projects.

Prerequisites

Edge Impulse Account: You need an Edge Impulse account with access to create custom transformation blocks and pipelines.
Edge Impulse CLI: Install the Edge Impulse CLI (edge-impulse-cli) version 1.21.1 or higher.
Python: Python 3.7 or higher.
Docker: Required for building and pushing transformation blocks.
Git: For version control and repository management.

Repository Structure

health-reference-design-public-data/
├── DataProcessor/
│   ├── transform.py
│   ├── parameters.json
│   ├── requirements.txt
│   └── Dockerfile
├── MetadataGenerator/
│   ├── transform.py
│   ├── parameters.json
│   ├── requirements.txt
│   └── Dockerfile
├── DataCombiner/
│   ├── transform.py
│   ├── parameters.json
│   ├── requirements.txt
│   └── Dockerfile
├── README.md
└── LICENSE

DataProcessor/: Contains the transformation block for processing raw data.
MetadataGenerator/: Contains the transformation block for extracting and attaching metadata.
DataCombiner/: Contains the transformation block for combining all processed data.
README.md: Documentation and instructions.
LICENSE: License information.

Setting Up the Repository

Clone the Repository:

cd health-reference-design-public-data

Navigate to Each Transformation Block:

The repository contains separate folders for each transformation block. You'll need to set up each one individually.

Transformation Blocks

1. DataProcessor

Processes raw sensor data for each subject.

Files:

transform.py: Script to process raw data files.
parameters.json: Defines parameters for the transformation block.
requirements.txt: Python dependencies.
Dockerfile: Docker configuration for the block.

Steps:

Navigate to the DataProcessor directory:
```
cd DataProcessor
```
Initialize the Transformation Block:
```
edge-impulse-blocks init --clean
```
- Select Transformation block when prompted.
- Provide a name and description.
Push the Block to Edge Impulse:
```
edge-impulse-blocks push
```
Repeat: After pushing, return to the main directory:
```
cd ..
```

2. MetadataGenerator

Extracts metadata from questionnaire files and attaches it to the data.

Files:

transform.py
parameters.json
requirements.txt
Dockerfile

Steps:

Navigate to the MetadataGenerator directory:
```
cd MetadataGenerator
```
Initialize the Transformation Block:
```
edge-impulse-blocks init --clean
```
- Select Transformation block when prompted.
- Provide a name and description.
Push the Block to Edge Impulse:
```
edge-impulse-blocks push
```
Repeat: Return to the main directory:
```
cd ..
```

3. DataCombiner

Combines all processed data into a single dataset.

Files:

transform.py
parameters.json
requirements.txt
Dockerfile

Steps:

Navigate to the DataCombiner directory:
```
cd DataCombiner
```
Initialize the Transformation Block:
```
edge-impulse-blocks init --clean
```
- Select Transformation block when prompted.
- Provide a name and description.
Push the Block to Edge Impulse:
```
edge-impulse-blocks push
```
Return to the main directory:
```
cd ..
```

Creating the Pipeline in Edge Impulse

Now that all transformation blocks are pushed to Edge Impulse, you can create a pipeline to chain them together.

Steps:

Access Pipelines:

In Edge Impulse Studio, navigate to your organization.
Go to Data > Pipelines.

Add a New Pipeline:

Click on + Add a new pipeline.
Name: PPG-DaLiA Data Processing Pipeline
Description: Processes PPG-DaLiA data from raw files to a combined dataset. Output Dataset: combined-dataset

Configure Pipeline Steps:Parameters:

Step 1: Process Subject Datajson

code

Transformation Block: DataProcessor
Filter: name LIKE '%S%_E4%' (Selects subjects S1_E4 to S15_E4) "dataset-name": "ppg_dalia_combined.parquet"
Input Dataset: raw-dataset (Replace with your dataset name)
Output Dataset: processed-datasetave the Pipeline.

Parameters: Running the Pipeline

{
  "in-directory": "."In the pipeline list, click on the ⋮ (ellipsis) next to your pipeline.
}

Step 2: Generate MetadataCheck the pipeline logs to ensure each step runs successfully.

Transformation Block: MetadataGenerator
Filter: Same as Step 1
Input Dataset: processed-datasetAfter completion, verify that the datasets (processed-dataset and combined-dataset) have been created and populated.
Output Dataset: processed-dataset (Update in place)

Parameters:(ppg_dalia_combined.parquet), you can:

{
  "in-directory": "."Create a new project in Edge Impulse.
}d the combined dataset.

Step 3: Combine Processed Data

Build models for HRV analysis and activity classification.

Transformation Block: DataCombinerraining, and evaluation.- Filter: name LIKE '%' (Selects all data items)
Input Dataset: processed-dataset
Output Dataset: combined-dataset

Parameters:

{
  "dataset-name": "ppg_dalia_combined.parquet"
}

Save the Pipeline.

Running the Pipeline

Run the Pipeline:

In the pipeline list, click on the ⋮ (ellipsis) next to your pipeline.
Select Run pipeline now.

Monitor Execution:

Check the pipeline logs to ensure each step runs successfully.
Address any errors that may occur.

Verify Output:

After completion, verify that the datasets (processed-dataset and combined-dataset) have been created and populated.

Using the Combined Dataset

With the combined dataset (ppg_dalia_combined.parquet), you can:

Import the Data into an Edge Impulse Project:

Create a new project in Edge Impulse.
Use the Data Acquisition tab to upload the combined dataset.
Ensure data is correctly labeled and metadata is intact.

Develop Machine Learning Models:

Build models for HRV analysis and activity classification.
Utilize Edge Impulse's tools for data exploration, model training, and evaluation.

edgeimpulse/health-reference-design-public-data

health-reference-design-public-data

Transformation Blocks:

Table of Contents

Overview

Prerequisites

Repository Structure

Setting Up the Repository

Clone the Repository:

Navigate to Each Transformation Block:

Transformation Blocks

1. DataProcessor

2. MetadataGenerator

3. DataCombiner

Creating the Pipeline in Edge Impulse

Access Pipelines:

Add a New Pipeline:

Configure Pipeline Steps:Parameters:

Step 1: Process Subject Datajson

Step 2: Generate MetadataCheck the pipeline logs to ensure each step runs successfully.

Step 3: Combine Processed Data

Save the Pipeline.

Running the Pipeline

Run the Pipeline:

Monitor Execution:

Verify Output:

Using the Combined Dataset

Import the Data into an Edge Impulse Project:

Develop Machine Learning Models: