AI4S2S/s2spy

switch from _linregress to da.polyfit?

Opened this issue · 3 comments

  • Firstly, We noticed that the current implementation of preprocess does not support dask arrays.
  • Secondly, I'm now noticing that preprocess._linregress returns NaNs while da.polyfit does work. That might be related to a floating point precision issue.

This is the input data before it enters preprocess._detrend():
image

This is what I tried:

slope, intercept = xr.apply_ufunc(
        preprocess._linregress,
        data["time"].astype(float),
        data.astype(float),
        input_core_dims=[["time"], ["time"]],
        output_core_dims=[[], []],
        vectorize=True,
    )

The slope & intercept is all NaN.

While:

coeffs = data.polyfit(dim="time", deg=1, skipna=True)["polyfit_coefficients"]

coeffs.sel(degree=1).plot():
image

I don't know why this goes wrong..
It was able to fit at this floating precision before.
These are some trend values for daily soil moisture in EU:
image

Oeh, I found a bug, some years were all NaN, this means there no timeseries at all with no NaNs and then the linregress with return all NaNs as well.

Maybe nice to add a check for this type of error.

Hi Sem, not sure if I get it correctly. I just tested the old implementation of preprocess (still using scipy.stats.linregress(x, y)) with a sst dataset, where I manually turn all values to np.nan for one time step. The function still gives me meaningful values for both slope and intercept. 😅I will put my minimum example here.

Here is an example:

import numpy as np
import urllib
import xarray as xr
from s2spy import preprocess
import matplotlib.pyplot as plt

# URL of the dataset from zenodo
sst_url = "https://zenodo.org/record/8186914/files/sst_daily_1959-2021_5deg_Pacific_175_240E_25_50N.nc"
sst_field = "sst_daily_1959-2021_5deg_Pacific_175_240E_25_50N.nc"
urllib.request.urlretrieve(sst_url, sst_field)

data = xr.open_dataset(sst_field)

# manually turn all values to np.nan for one time step.
data_subset['sst'].loc[dict(time="2020-12-20")] = np.ones(data_subset['sst'].isel(time=100).values.shape) * np.nan

preprocessor = preprocess.Preprocessor(
    rolling_window_size=25,
    timescale="daily",
    detrend="linear",
    subtract_climatology=True,
)

preprocessor.fit(data_subset)

preprocessor.trend["slope"].sst.plot()

The values I got are not all nan:

Slope:

image

Intercept:

image