Better handling of missing signal
Closed this issue · 3 comments
Signal interruption can happen, currently we pad the missing signal early on (during cleaning) and then proceed normally.
import neurokit2 as nk
signal = np.concatenate(
[
nk.ecg_simulate(duration=10, sampling_rate=100),
[np.nan] * 1000,
nk.ecg_simulate(duration=10, sampling_rate=100),
]
)
nk.signal_plot(signal)
df, _ = nk.ecg_process(signal, sampling_rate=100)
df.plot(subplots=True, figsize=(10,10))
We should probably fill with NaNs at the indices where the raw signal is NaN. The only problem is that we need to make sure that our feature computation function (e.g., HRV, etc) are nan-proof, and return nan if nans are present (but do not error).
The only problem is that we need to make sure that our feature computation function (e.g., HRV, etc) are nan-proof, and return nan if nans are present (but do not error).
Would you say returning NaN whenever there are any NaNs present would be the desired behavior for all features, or just some?
In #720 I tried adapting the HRV features for missing R-R intervals but some features required more adaptations than others (e.g. for the MeanNN we can just ignore the missing intervals whereas for RMSSD I removed differences detected as non-consecutive).
I'm still not confident that I adapted all the features in the most "ideal" way (in terms of best approximating the value that would be obtained without missing data), and I imagine this is something we would have to determine empirically if we wanted to be sure.
But I guess a start could be to return NaN for all functions that would currently raise an error, and then slowly adapt these functions for the missing data if we think we can get a reasonable approximation?
But I guess a start could be to return NaN for all functions that would currently raise an error, and then slowly adapt these functions for the missing data if we think we can get a reasonable approximation?
I think that's good. And indeed I expect many cases / indices where dealing with missing data is not straightforward, in that case it's safer just to return NaNs and it will be up to the user to explicitly do something about it (analyze separately various chunks, input signal etc.)
Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.