Deep Imbalanced Regression to Estimate Vascular Age from PPG Data: A Novel Digital Biomarker for Cardiovascular Health
Welcome to our repository where we provide the PyTorch implementation of the Dist loss. This loss function, described in detail here, leverages data distribution priors to effectively address deep imbalanced regression tasks, such as estimating vascular age from PPG data.
The figure below demonstrates the effectiveness of the Dist loss in addressing deep imbalanced regression tasks. Subfigures (a) and (b) show the results generated by the model using only L1 loss (with similar performance observed when using MSELoss), while subfigures (c) and (d) display the results produced by the model using the Dist loss during the training phase. Additionally, the following table highlights the effectiveness of this loss function in few-shot regions.
To illustrate the usage of the Dist loss, we have included a straightforward example in the example.ipynb
notebook, which demonstrates its application in a synthetic regression task. When utilizing this loss function, several key parameters must be carefully considered:
-
batch_size
: Setting a very small batch size can degrade performance because such sizes are inadequate for reliably estimating the distribution of model outputs. A practical solution is to calculate the loss values over multiple batches or at the end of each epoch if your data or hardware does not support larger batch sizes. -
step
: This parameter defines the interval between discrete labels in the estimated distribution. For instance, in age estimation, a step value of 1 is appropriate, whereas a value like 10,000 might be suitable for predicting house prices. The step size directly impacts the granularity of your task. -
min_label
andmax_label
: These parameters define the theoretical range of possible labels. Any label values falling belowmin_label
or abovemax_label
will be assigned a probability of zero in the output distribution. Importantly, these are not the minimum and maximum values present in your dataset. For example, with a dataset ranging from 40 to 80 years in age,min_label
andmax_label
might be set to 20 and 100 years, respectively. -
drop_last
: This parameter should be set to True in your training set to avoid mismatched tensor shapes, which can lead to errors.
Many thanks for these repos for their great contribution!
https://github.com/hsd1503/resnet1d
https://github.com/google-research/fast-soft-sort
https://github.com/YyzHarry/imbalanced-regression