switch from _linregress to da.polyfit?
Opened this issue · 3 comments
- Firstly, We noticed that the current implementation of preprocess does not support dask arrays.
- Secondly, I'm now noticing that preprocess._linregress returns NaNs while da.polyfit does work. That might be related to a floating point precision issue.
This is the input data before it enters preprocess._detrend()
:
This is what I tried:
slope, intercept = xr.apply_ufunc(
preprocess._linregress,
data["time"].astype(float),
data.astype(float),
input_core_dims=[["time"], ["time"]],
output_core_dims=[[], []],
vectorize=True,
)
The slope & intercept is all NaN.
While:
coeffs = data.polyfit(dim="time", deg=1, skipna=True)["polyfit_coefficients"]
I don't know why this goes wrong..
It was able to fit at this floating precision before.
These are some trend values for daily soil moisture in EU:
Oeh, I found a bug, some years were all NaN, this means there no timeseries at all with no NaNs and then the linregress with return all NaNs as well.
Maybe nice to add a check for this type of error.
Hi Sem, not sure if I get it correctly. I just tested the old implementation of preprocess (still using scipy.stats.linregress(x, y)
) with a sst dataset, where I manually turn all values to np.nan
for one time step. The function still gives me meaningful values for both slope and intercept. 😅I will put my minimum example here.
Here is an example:
import numpy as np
import urllib
import xarray as xr
from s2spy import preprocess
import matplotlib.pyplot as plt
# URL of the dataset from zenodo
sst_url = "https://zenodo.org/record/8186914/files/sst_daily_1959-2021_5deg_Pacific_175_240E_25_50N.nc"
sst_field = "sst_daily_1959-2021_5deg_Pacific_175_240E_25_50N.nc"
urllib.request.urlretrieve(sst_url, sst_field)
data = xr.open_dataset(sst_field)
# manually turn all values to np.nan for one time step.
data_subset['sst'].loc[dict(time="2020-12-20")] = np.ones(data_subset['sst'].isel(time=100).values.shape) * np.nan
preprocessor = preprocess.Preprocessor(
rolling_window_size=25,
timescale="daily",
detrend="linear",
subtract_climatology=True,
)
preprocessor.fit(data_subset)
preprocessor.trend["slope"].sst.plot()
The values I got are not all nan:
Slope:
Intercept: