PhysBench

Please use the Tutorial/Noob Heart.ipynb to learn about this framework.

Although I personally prefer to use TensorFlow, PhysBench is not tied to any specific deep learning framework. For Pytorch and JAX users, please refer to:Tutorial/Noob Heart (Pytorch).ipynb and Tutorial/Noob Heart (JAX).ipynb

Environments

First, create a new environment for PhysBench.

conda create -n physbench python=3.9
conda activate physbench
pip install -r requirements.txt

Then, install the deep learning frameworks according to your needs. If you need to install multiple frameworks, it is recommended to create different environments for them.
Install TensorFlow environment:

conda install -c conda-forge tensorflow-gpu keras

Install Pytorch environment:

conda install pytorch torchvision torchaudio pytorch-cuda -c pytorch -c nvidia

Inference on a single video

To extract BVP signals from your own collected video, please execute the following code.

python inference.py --video input_face.avi --model seq

Currently supported models include seq, tscan, deepphys, efficientphys, physnet, chrom, pos, ica.
Use --out path_to_bvp.csv to specify the save path for the output BVP waveform;
use --show-wave for visualization of the output;
use --weights path_to_weights.h5 to specify weights path (or it will automatically use the weights trained on RLAP).

Models

We implemented 7 neural models and 3 unsupervised models, DeepPhys, TS-CAN, EfficientPhys, PhysNet, PhysFormer, 1D CNN, NoobHeart, Chrom, ICA, and POS. Among them, the Seq-rPPG is a new model we proposed that uses only one-dimensional convolution with minimal computational complexity and high performance. NoobHeart is a toy model used in the tutorial with only 361 parameters and includes a simple 2 layers 3-dimensional convolution structure; however it has decent performance making it suitable as an entry-level model. Chrom，ICA，and POS are three unsupervised models. Among the neural models，PhysFormer is implemented using Pytorch while others use Tensorflow.

For unsupervised methods, please refer to unsupervised_methods.py; for methods implemented using TensorFlow, please refer to models.py; for methods implemented using PyTorch, please refer to models_torch.py. Our framework is not dependent on a specific deep learning framework. Please configure the environment as needed and install the required packages using requirements.txt.

Model	Publication	Resolution	Params	Frame FLOPs	Input	Output	Type
DeepPhys	ECCV 18	36x36	532K	52M	Diff+RGB	Diff	2D CNN
TS-CAN	NIPS 20	36x36	532K	52M	Diff+RGB	Diff	2D CNN
EfficientPhys	WACV 23	72x72	2.16M	230M	Std RGB	Diff	2D CNN
PhysNet	BMVC 19	32x32	770K	54M	RGB	Wave	3D CNN
PhysFormer	CVPR 22	128x128	7.03M	324M	RGB	Wave	Transformer
Seq-rPPG	This paper	8x8	196K	261K	RGB	Wave	1D CNN
NoobHeart	This paper	8x8	361	5790	RGB	Wave	3D CNN
Chrom	TBME 13	-	-	-	-	-	Unsupervised
ICA	TBME 11	-	-	-	-	-	Unsupervised
POS	TBME 16	-	-	-	-	-	Unsupervised

Add new models (supervised or unsupervised)

For any model, whether it's Tensorflow, Pytorch, or using Numpy, the input is facial video clips and the output is corresponding physiological signals. The only thing that needs to be done is to encapsulate the algorithm into a function, inputting video frames and outputting BVP signals or heart rate.

def model(frames):
    # Frames is (Batch, Depth, H, W, C) matrix, only contain the face.
    input = preprocess(frames) # Preprocessing (if necessary)
    BVP   = algorithm(input)  
    return BVP                 # (Batch, Depth)
    
# Evaluate the model on the HDF5 standard dataset
eval_on_dataset('test_set.h5', model, depth, (H, W), save='results/my_result.h5')

# Obtain HR metrics
hr_metrics = get_metrics('results/my_result.h5')

# Obtain HRV metrics
hrv_metrics = get_metrics_HRV('results/my_result.h5')

Open the visualization webpage, where you can find my_result.h5 and view the waveform of each video.

python visualization.py

Datasets

Adding a dataset is simple, just write a loader and include a index file (usually only 20 lines of code). Currently supported loaders are RLAP (i.e., CCNU), UBFC-rPPG2, UBFC-PHYS, MMPD, PURE, COHFACE, and SCAMPS. You can use our recording program PhysRecorder https://github.com/KegangWangCCNU/PhysRecorder to record datasets, just need a webcam and Contec CMS50E to collect strictly synchronized lossless format datasets, which can be directly used with the RLAP loader.
It's recommended to train on datasets with Good Synchronicity, as most models are highly sensitive to the synchronicity of the training set. Moreover, not all videos in UBFC-rPPG are unsynchronized; based on experience, some models with a Temporal Shift Module (TSM) can adapt to it, such as TS-CAN and EfficientPhys, but their performance is still inferior compared to training on highly synchronized datasets.

Dataset	Participants	Frames	Lossless	Synchronicity
RLAP	58	3.53M	MJPG	Good
RLAP-rPPG	58	781K	YES	Good
PURE	10	106K	YES	Good
UBFC-rPPG	42	75K	YES	Bad
UBFC-Phys	56	1.06M	MJPG	-
MMPD	33	1.15M	H.264	-
COHFACE	40	192K	MPEG-4	Good
SCAMPS	2800	1.68M	Synthetics	Good

You need to organize an index file for each dataset, and PhysBench provides the official versions of these files. Usually, you don't need to change the folder structure of the datasets to use them. Please check the csv files in the datasets folder.

PURE
Stricker, R., Müller, S., Gross, H.-M.Non-contact "Video-based Pulse Rate Measurement on a Mobile Service Robot" in: Proc. 23st IEEE Int. Symposium on Robot and Human Interactive Communication (Ro-Man 2014), Edinburgh, Scotland, UK, pp. 1056 - 1062, IEEE 2014
UBFC-rPPG
S. Bobbia, R. Macwan, Y. Benezeth, A. Mansouri, J. Dubois, "Unsupervised skin tissue segmentation for remote photoplethysmography", Pattern Recognition Letters, 2017.
UBFC-Phys
Sabour, R. M., Benezeth, Y., De Oliveira, P., Chappe, J., & Yang, F. (2021). Ubfc-phys: A multimodal database for psychophysiological studies of social stress. IEEE Transactions on Affective Computing.
MMPD
Jiankai Tang, Kequan Chen, Yuntao Wang, Yuanchun Shi, Shwetak Patel, Daniel McDuff, Xin Liu, "MMPD: Multi-Domain Mobile Video Physiology Dataset", IEEE EMBC, 2023
COHFACE
Guillaume Heusch, André Anjos, Sébastien Marcel, “A reproducible study on remote heart rate measurement”, arXiv, 2016.
SCAMPS
D. McDuff, M. Wander, X. Liu, B. Hill, J. Hernandez, J. Lester, T. Baltrusaitis, "SCAMPS: Synthetics for Camera Measurement of Physiological Signals", NeurIPS, 2022

Note: Our framework implemented UBFC-Phys, but due to the large motion amplitude, there is a lot of noise in its Ground Truth, and the test results may not be reliable, so they are not listed. Further measures may need to be taken to filter out inaccurate Ground Truth signals before the results can be released.

Add new datasets

To add a new dataset, two things need to be prepared: adding a Loader and organizing a file index.
Taking MMPD as an example:

class LoaderMMPD(Loader):

    def __call__(self, vid):                        # vid is the relative path of the video file.
        path = f"{self.base}{vid}"                  # Obtain the absolute path
        f = scipy.io.loadmat(path)                  
        bvp = f['GT_ppg'][0]                        # (Depth, )
        ts = np.arange(bvp.shape[0])/30 # 30fps     # (Depth, )
        vid = (f['video']*255).astype(np.uint8)     # (Depth, H, W, C)
        return vid, bvp, ts                         # Return video frame, BVP, timestamps
        
loader_mmpd = LoaderMMPD(mmpd_root) # Use the MMPD dataset root directory to initialize the loader.

# Use Loader to package the MMPD raw dataset into a HDF5 standard dataset, witch can be used for testing models.
dump_dataset("mmpd_dataset.h5", files_mmpd, loader_mmpd, labels=labels_list)

Train and Test

Train on our RLAP dataset, please see the benchmark_RLAP folder. Train on the SCAMPS dataset, please see the benchmark_SCAMPS folder. In addition, for ablation experiments and training on PURE and UBFC, please see benchmark_addition. All code is provided in Jupyter notebooks with our replication included; if you have read the tutorial, replicating results should be easy.

Training evaluation on RLAP

RLAP is an appropriate training set, and we divide RLAP into training ,validation and testing set. In addition, tests were also conducted on the entire UBFC and PURE datasets. For code and results, please refer to benchmark_RLAP.
The testing on the RLAP and RLAP-rPPG dataset is different from other datasets. Due to the longer duration of RLAP dataset videos, a 30s moving window is used instead of the entire video for heart rate prediction. For other datasets, the entire 1min video is used for heart rate prediction.

Intra-dataset testing on RLAP

Model	MAE	RMSE	Pearson Coef.
DeepPhys	1.52	4.40	0.906
TS-CAN	1.23	3.59	0.937
EfficientPhys	1.05	3.41	0.943
PhysNet	1.12	4.13	0.916
PhysFormer	1.56	6.28	0.803
Seq-rPPG	1.07	4.15	0.917
NoobHeart	1.79	5.85	0.832
Chrom	6.90	16.0	0.341
ICA	6.05	13.3	0.380
POS	4.25	12.1	0.501

Intra-dataset testing on RLAP-rPPG

Model	HR			HRV-SDNN
Model	MAE	RMSE	Pearson Coef.	MAE	RMSE	Pearson Coef.
DeepPhys	1.76	4.87	0.877	57.6	64.2	0.338
TS-CAN	1.23	3.82	0.922	50.1	59.3	0.395
EfficientPhys	1.00	3.39	0.939	43.7	53.7	0.356
PhysNet	1.04	3.80	0.923	36.4	43.8	0.306
PhysFormer	0.78	2.83	0.957	28.8	34.4	0.450
Seq-rPPG	0.81	2.97	0.953	14.4	22.1	0.424
NoobHeart	1.57	4.71	0.883	52.3	57.3	0.488
Chrom	5.88	14.1	0.451	63.7	69.8	0.267
ICA	4.56	9.91	0.569	74.7	77.7	0.408
POS	3.60	10.1	0.634	70.6	75.8	0.267

Cross-dataset testing on UBFC-rPPG

The videos and physiological signals of UBFC-rPPG are not strictly synchronized, which results in a fixed error between the heart rate extracted by the rPPG algorithm and GT. Therefore, the error limit of UBFC-rPPG is approximately Pearson's coefficient 0.997, and further improvement in model accuracy will not yield better metrics.

Model	HR			HRV-SDNN
Model	MAE	RMSE	Pearson Coef.	MAE	RMSE	Pearson Coef.
DeepPhys	1.06	1.51	0.997	30.0	37.8	0.648
TS-CAN	0.99	1.44	0.997	25.6	31.8	0.588
EfficientPhys	1.03	1.45	0.997	10.1	15.4	0.827
PhysNet	0.92	1.46	0.997	12.2	14.9	0.887
PhysFormer	1.06	1.53	0.997	8.37	11.1	0.921
Seq-rPPG	0.87	1.40	0.997	4.73	8.25	0.911
NoobHeart	1.14	1.69	0.996	33.1	36.5	0.697
Chrom	3.82	12.3	0.830	23.7	28.6	0.672
ICA	1.58	2.55	0.990	33.3	42.0	0.604
POS	2.45	8.56	0.900	30.5	37.6	0.513

Cross-dataset testing on PURE

Unsupervised methods are usually sensitive to preprocessing and postprocessing, and many parameters affect their performance. PhysBench optimizes these additional steps as much as possible to fully demonstrate the model's performance. Surprisingly, POS outperforms most supervised methods on the PURE dataset, and after careful verification, the results are genuine.

Model	HR			HRV-SDNN
Model	MAE	RMSE	Pearson Coef.	MAE	RMSE	Pearson Coef.
DeepPhys	2.80	8.31	0.937	86.0	92.0	0.297
TS-CAN	2.12	6.67	0.960	61.4	74.1	0.293
EfficientPhys	1.33	5.97	0.968	28.0	44.0	0.468
PhysNet	0.51	0.91	0.999	22.5	35.7	0.560
PhysFormer	1.63	9.45	0.941	21.6	32.0	0.576
Seq-rPPG	0.37	0.63	1.000	9.51	15.8	0.872
NoobHeart	0.45	0.70	1.000	50.8	58.1	0.657
Chrom	2.08	12.3	0.856	40.4	56.2	0.418
ICA	1.12	3.97	0.986	67.5	76.5	0.376
POS	0.39	0.66	1.000	56.1	69.2	0.467

Cross-dataset testing on MMPD-Simplest

Referencing https://github.com/McJackTang/MMPD_rPPG_dataset, we tested all models in the simplest scenario. MMPD is a highly compressed dataset using H.264 encoding, which may affect some compression-sensitive models. In the simplest scenario, it only contains light skin samples and no head movement.
The simplest scenario is as follows: motion='Stationary', skin_color='3', light=['LED-high', 'LED-low', 'Incandescent']

Model	MAE	RMSE	Pearson Coef.
DeepPhys	1.03	1.46	0.987
TS-CAN	0.95	1.40	0.989
EfficientPhys	1.57	5.40	0.821
PhysNet	0.97	1.45	0.988
PhysFormer	1.70	4.13	0.890
Seq-rPPG	1.52	3.93	0.915
NoobHeart	2.78	6.31	0.763
Chrom	12.2	19.2	0.151
ICA	4.08	9.45	0.642
POS	4.30	10.8	0.426

Cross-dataset testing on COHFACE

COHFACE is a dataset using MPEG-4 compression with a very high compression ratio, and the size of each video does not exceed 2MB, which causes most rPPG algorithms to fail on it. However, some structures show robustness to high compression ratios: such as DeepPhys-like structures that input the difference between video frames and output the difference in BVP. In addition, other poorly performing algorithms are not completely without performance; due to the failure of predicting some videos, this part of the error is actually meaningless and more appropriate metrics should be found to measure performance.

Model	MAE	RMSE	Pearson Coef.
DeepPhys	2.75	8.63	0.733
TS-CAN	2.28	7.81	0.774
EfficientPhys	3.94	12.0	0.528
PhysNet	19.6	26.9	-0.45
PhysFormer	20.0	26.1	-0.37
Seq-rPPG	16.1	25.7	-0.12
NoobHeart	25.0	29.5	-0.36
Chrom	27.4	32.4	-0.32
ICA	7.91	16.1	0.282
POS	22.3	29.9	-0.32

Training evaluation on SCAMPS

Training on synthetic datasets is difficult, and we observed that overfitting can easily occur, requiring many steps to prevent overfitting, such as controlling the learning rate, additional regularization operations, etc. Smaller models may not be prone to overfitting; NoobHeart is an example where we froze the LayerNormalization layer with initial parameters and trained for 5 epochs while achieving similar performance as training on real datasets. This could be the first step in training on synthetic datasets.

Referencing https://github.com/remotebiosensing/rppg and rPPG-Toolbox, we use OneCycle learning rate and AdamW optimizer to mitigate overfitting, and train DeepPhys. For details, please refer to https://github.com/KegangWangCCNU/PhysBench/blob/main/benchmark_SCAMPS/DeepPhys.ipynb

Cross-dataset testing on UBFC

Model	MAE	RMSE	Pearson Coef.
DeepPhys	9.51	18.2	0.608
NoobHeart	1.05	1.49	0.997

Cross-dataset testing on PURE

Model	MAE	RMSE	Pearson Coef.
DeepPhys	5.41	13.3	0.852
NoobHeart	0.53	0.88	0.999

Visualization

Please run visualization.py to open the visualization webpage. Before visualizing, make sure all result files are saved in the results folder. When the framework generates result files, it links to the dataset files, so the visualization webpage can display face images synchronously. Once the link is invalid, such as when dataset files are moved, faces cannot be displayed on the webpage.

Limitation

The test data used by PhysBench may not necessarily reflect the accuracy in real-world scenarios, where there are more diverse lighting conditions, head movements, skin tones and age groups. The heart rate provided by the algorithm through Welch method may not fully comply with medical standards and requires further rigorous evaluation before clinical use. We aim to inform users of the weaknesses and limitations of the algorithm as much as possible through the visualization webpage.

Full Benchmark Table

All the results of the experiments we conducted can be found here.
FullBench.pdf

Request RLAP dataset

If you wish to obtain the RLAP dataset, please send an email to kegangwang@mails.ccnu.edu.cn and cc yantaowei@ccnu.edu.cn, with the Data Usage Agreement attached.
See https://github.com/KegangWangCCNU/RLAP-dataset

Citation

If you use PhysBench framework, PhysRecorder data collection tool, or the models included in this framework, please cite the following paper

@misc{wang2023physbench,
      title={PhysBench: A Benchmark Framework for Remote Physiological Sensing with New Dataset and Baseline}, 
      author={Kegang Wang and Yantao Wei and Mingwen Tong and Jie Gao and Yi Tian and YuJian Ma and ZhongJin Zhao},
      year={2023},
      eprint={2305.04161},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

I am looking for a CS Ph.D. position, my research field is computer vision and remote physiological sensing, and I will graduate with a master's degree in June 2024. If anyone is interested, please send an email to kegangwang@mails.ccnu.edu.cn.