"calibration_year_final" not correctly taken into account in pdsi calculation

Question

"calibration_year_final" not correctly taken into account in pdsi calculation

Opened this issue 4 months ago · 1 comments

Code

import numpy as np
import pandas as pd
import xarray as xr

data_start_year=1990
data_end_year=2010

dates = pd.date_range(start=f'{data_start_year}-01-01', end=f'{data_end_year}-12-31', freq='MS')
lat = np.arange(10, 20, 1)
lon = np.arange(180, 190, 1) 


# Generating random data for precipitation and potential evapotranspiration (PET)
precips_data = np.random.rand(len(dates), len(lat), len(lon)) # Precipitation data
pet_data = np.random.rand(len(dates), len(lat), len(lon)) # Potential evapotranspiration data

# Creating the first dataset with precipitation and PET data
first_dataset = xr.Dataset(
    {
        "precips": (["time", "latitude", "longitude"], precips_data),
        "pet": (["time", "latitude", "longitude"], pet_data)
    },
    coords={
        "time": dates,
        "latitude": lat,
        "longitude": lon},
)

# Selecting the calibration year
calibration_year_final=2005

# Modifying data after the calibration year: values are doubled
chosen_date=pd.Timestamp(f"{calibration_year_final+1}-01-01")
second_dataset = first_dataset.where(first_dataset.time < chosen_date, first_dataset * 2)


def palmer_pdsi_wrapper(precips, pet, awc, data_start_year, calibration_year_initial, calibration_year_final):
    pdsi, phdi, wplm, z, _ = palmer.pdsi(precips, pet, awc, data_start_year, calibration_year_initial, calibration_year_final)
    return pdsi, phdi, wplm, z


import os
os.chdir("/home/jeremiesicard/Documents/main-project")
from thirdpart.submodules.climate_indices.src.climate_indices import palmer



# Calculating the Palmer Drought Severity Index (PDSI) for the first dataset
first_pdsi_data =  xr.apply_ufunc(
                palmer_pdsi_wrapper,
                first_dataset.precips,
                first_dataset.pet,
                50,
                data_start_year,
                data_start_year,
                calibration_year_final,
                input_core_dims=[["time"], ["time"], [], [], [], []],
                output_core_dims=[["time"]] * 4,
                vectorize=True,
                output_dtypes=[first_dataset.precips.dtype] * 4,
            )

first_pdsi_dataset=first_pdsi_data[0].compute().to_dataset(name='pdsi').sortby(['latitude','longitude'])


# Calculating the Palmer Drought Severity Index (PDSI) for the second dataset
second_pdsi_data =  xr.apply_ufunc(
                palmer_pdsi_wrapper,
                second_dataset.precips,
                second_dataset.pet,
                50,
                data_start_year,
                data_start_year,
                calibration_year_final,
                input_core_dims=[["time"], ["time"], [], [], [], []],
                output_core_dims=[["time"]] * 4,
                vectorize=True,
                output_dtypes=[second_dataset.precips.dtype] * 4,
            )

second_pdsi_dataset=second_pdsi_data[0].compute().to_dataset(name='pdsi').sortby(['latitude','longitude'])


chosen_date_2 = pd.Timestamp(f"{calibration_year_final}-12-31")

print(f"Is the first pdsi equal to the second pdsi for the period {data_start_year}-01-01 to {calibration_year_final}-12-31?\n",
       np.array_equal(
                first_pdsi_dataset.pdsi.sel(time=first_pdsi_dataset.time <= chosen_date_2).values,
                second_pdsi_dataset.pdsi.sel(time=second_pdsi_dataset.time <= chosen_date_2).values
                ))

print(f"The values that differ over the period {data_start_year}-01-01 to {calibration_year_final}-12-31 are as follows:\n",
      np.unique(first_pdsi_dataset.pdsi.sel(time=first_pdsi_dataset.time <= chosen_date_2)-second_pdsi_dataset.pdsi.sel(time=second_pdsi_dataset.time <= chosen_date_2), return_counts=True))

Describe the bug

The precipitation and PET values that follow the "calibration year final" have an influence on the PDSI values before the "calibration year final" in the calculation of the PDSI.

Expected behavior

In this example, considering:

a first xarray.Dataset ranging from 1990 to 2010 with a monthly frequency of precipitation and PET
a second xarray.Dataset where the precipitation and PET values are identical to the first xarray.Dataset from 1990 to 2005 (inclusive), then different from 2006 to 2010
a "calibration year final" in 2005

I expected to have identical PDSI values between the two xarray.Datasets from 1990 to 2005 (inclusive), then different from 2006 onwards. However, as you can see from the printouts, the values are also different for the period from 1990 to 2005. Is the "calibration year final" correctly accounted for?

Screenshots

Is the first pdsi equal to the second pdsi for the period 1990-01-01 to 2005-12-31?
False
The values that differ over the period 1990-01-01 to 2005-12-31 are as follows:
(array([-5.46034514, -2.34233531, -2.23779737, -1.39403927, -1.07894443,
-0.96372188, -0.75961582, -0.68137539, 0. , 2.02436987,
2.2568226 , 2.51596723, 2.8048687 , 3.12694393, 3.48600215]), array([ 1, 1, 1, 1, 1, 1, 1, 1, 19186,
1, 1, 1, 1, 1, 1]))

Desktop

OS: WSL
Commit version: d108eee

Answer 1 · 2024-03-14T19:39:06.000Z

Thanks for this detailed bug report, @jeremietellus