lbelzile/mev

`gdp.ll` calculates log(0), causing `gpd.nll` to silently pass error to `tsab.gpd`.

Closed this issue · 3 comments

Hi, many thanks for the package and continued development.

For the example data below, tstab.gpd returns the error Error in seq.default(1, lux, len = nknots) : 'length.out' must be a non-negative number. Forking this package and tracing the error, it appears to occur in this line of gpd.ll, where the third element of 1 + xi/sigma * dat is 0, the log of which is -Inf, giving said error.

Could there (i) be a sensible solution to this possible bug, or (ii) a more sensible error message when it occurs? There are also errors in the other threshold selection methods shown here, presumably for a similar reason.

Reprex for this issue:

library(mev)

# data
diffs <- c(2.81951297074556, 1.08243174850941, 9.10901987552643, 3.40175648033619, 
           1.64474362134933, 1.6418571472168, 7.74637460708618, 42.545136988163, 
           4.42913325130939, 33.4073157981038, 0.791920393705368, 2.95044100284576, 
           10.998235322535, 20.4508430361748, 2.69683688879013, 3.33915970474482, 
           37.7822144702077, 10.698377430439, 9.73197823762894, 0.390749335289001, 
           0.218639880418777, 11.9363431185484, 0.366057872772217, 2.119623452425, 
           41.309825360775, 30.4940503239632, 3.6598953306675, 2.0952250957489, 
           5.28838634490967, 20.3412247598171, 0.577117383480072, 5.04886768013239, 
           5.99352183938026, 0.845351293683052, 3.17710891366005, 1.17714792490005, 
           1.11236909031868, 0.418272614479065, 0.561639964580536, 5.2942131832242, 
           41.5214887857437, 2.95898319780826, 0.183028936386108, 0.626213699579239, 
           0.42288264632225, 1.17204022407532, 1.44027414917946, 5.32458829879761, 
           16.4689251184464, 1.69967293739319, 6.59124094247818, 1.27000004053116, 
           3.48916530609131, 2.80104877054691, 0.683028936386108, 1.500114813447, 
           0.186223056167364, 1.50851613283157, 0.0966406241059303, 0.127125412225723, 
           0.282202571630478, 0.886843204498291, 37.8948299884796, 1.78267043828964, 
           0.401593267917633, 4.40598256886005, 0.0817966535687447, 3.25152472406626, 
           8.40313673019409, 0.254000008106232, 18.2489268779755, 1.08870995044708, 
           1.74532070755959, 0.331796653568745, 0.0915144681930542, 0.604640662670135, 
           6.03872404247522, 1.6589712202549, 8.79172092676163, 10.5326889157295, 
           0.980914600193501, 3.05491060763597, 4.65537348389626, 11.4831943511963, 
           0.818241983652115, 0.117766812443733, 17.0479534268379, 1.39983341097832, 
           0.254000008106232, 0.53671869635582, 0.0588834062218666, 0.15039786696434, 
           0.983999967575073, 0.423241868615151, 1.34176522493362)

plot(diffs)

thcan <- quantile(
  diffs, 
  seq(0.5, 0.99, length.out = 25)
)

tstab.gpd(xdat = diffs, 
          thresh = thcan,  
          method = "profile"
)

Outputs from sessionInfo():

R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 12 (bookworm)

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.21.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
 [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mev_1.16

loaded via a namespace (and not attached):
 [1] numDeriv_2016.8-1.1 alabama_2023.1.0    Matrix_1.6-1.1      lattice_0.22-5      truncnorm_1.0-9    
 [6] splines_4.3.2       parallel_4.3.2      grid_4.3.2          nleqslv_3.3.5       compiler_4.3.2     
[11] quantreg_5.97       rstudioapi_0.15.0   tools_4.3.2         SparseM_1.81        cobs_1.3-7         
[16] survival_3.5-7      Rcpp_1.0.12         MASS_7.3-60         Rsolnp_1.16         MatrixModels_0.5-3 

Thanks for the detailed report. I went down the rabbit hole to track the errors, of which there were far too many to my taste...

  • The function gpd.ll and gpd.ll.optimcomputed a different formula for $\xi \ge -1$, despite the fact there was a special case afterwards for xi = -1 in gpd.ll which was never executed.
  • fit.gpd now returns a value based on gpd.ll for slot nllh, fixing a case with Grimshaw's algorithm (due to a typo when assigning to list tmp rather than temp that lead to NaN values and failure of the profile likelihood routine).
  • The profile likelihood function has been modified to handle cases where the shape is on the boundary, so the returned grid of psi values at which to compute the profile is admissible and contains no missing values. Since in this case the confidence interval is one-sided, the calculation is now done manually to prevent spurious error messages in smoothing spline routines. Confidence intervals are also truncated to the finite range of value (since the confidence intervals are obtained by extrapolation if the grid of values of psi is not wide enough).
  • The function now strips (without warning) any threshold that is too far into the tail. You had an instance in your example where the sample size was zero, so this would have thrown yet another error message.
  • If all estimates for the modified scale/shape in the parameter stability plot are constant, this returns a warning.
  • The plot S3 method for objects of class mev_tstab.gpd now strips NA values from calculation of the limits, preventing error messages when there are missing values for the confidence intervals (which happens, e.g., for Wald method whenever the estimated shape $\xi &lt; -0.5$).

Nice, thanks for the quick fix, and glad I could highlight some issues!