`gdp.ll` calculates log(0), causing `gpd.nll` to silently pass error to `tsab.gpd`.
Closed this issue · 3 comments
Hi, many thanks for the package and continued development.
For the example data below, tstab.gpd
returns the error Error in seq.default(1, lux, len = nknots) : 'length.out' must be a non-negative number
. Forking this package and tracing the error, it appears to occur in this line of gpd.ll
, where the third element of 1 + xi/sigma * dat
is 0, the log of which is -Inf
, giving said error.
Could there (i) be a sensible solution to this possible bug, or (ii) a more sensible error message when it occurs? There are also errors in the other threshold selection methods shown here, presumably for a similar reason.
Reprex for this issue:
library(mev)
# data
diffs <- c(2.81951297074556, 1.08243174850941, 9.10901987552643, 3.40175648033619,
1.64474362134933, 1.6418571472168, 7.74637460708618, 42.545136988163,
4.42913325130939, 33.4073157981038, 0.791920393705368, 2.95044100284576,
10.998235322535, 20.4508430361748, 2.69683688879013, 3.33915970474482,
37.7822144702077, 10.698377430439, 9.73197823762894, 0.390749335289001,
0.218639880418777, 11.9363431185484, 0.366057872772217, 2.119623452425,
41.309825360775, 30.4940503239632, 3.6598953306675, 2.0952250957489,
5.28838634490967, 20.3412247598171, 0.577117383480072, 5.04886768013239,
5.99352183938026, 0.845351293683052, 3.17710891366005, 1.17714792490005,
1.11236909031868, 0.418272614479065, 0.561639964580536, 5.2942131832242,
41.5214887857437, 2.95898319780826, 0.183028936386108, 0.626213699579239,
0.42288264632225, 1.17204022407532, 1.44027414917946, 5.32458829879761,
16.4689251184464, 1.69967293739319, 6.59124094247818, 1.27000004053116,
3.48916530609131, 2.80104877054691, 0.683028936386108, 1.500114813447,
0.186223056167364, 1.50851613283157, 0.0966406241059303, 0.127125412225723,
0.282202571630478, 0.886843204498291, 37.8948299884796, 1.78267043828964,
0.401593267917633, 4.40598256886005, 0.0817966535687447, 3.25152472406626,
8.40313673019409, 0.254000008106232, 18.2489268779755, 1.08870995044708,
1.74532070755959, 0.331796653568745, 0.0915144681930542, 0.604640662670135,
6.03872404247522, 1.6589712202549, 8.79172092676163, 10.5326889157295,
0.980914600193501, 3.05491060763597, 4.65537348389626, 11.4831943511963,
0.818241983652115, 0.117766812443733, 17.0479534268379, 1.39983341097832,
0.254000008106232, 0.53671869635582, 0.0588834062218666, 0.15039786696434,
0.983999967575073, 0.423241868615151, 1.34176522493362)
plot(diffs)
thcan <- quantile(
diffs,
seq(0.5, 0.99, length.out = 25)
)
tstab.gpd(xdat = diffs,
thresh = thcan,
method = "profile"
)
Outputs from sessionInfo()
:
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 12 (bookworm)
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.21.so; LAPACK version 3.11.0
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C LC_TIME=en_GB.UTF-8
[4] LC_COLLATE=en_GB.UTF-8 LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
time zone: Europe/London
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] mev_1.16
loaded via a namespace (and not attached):
[1] numDeriv_2016.8-1.1 alabama_2023.1.0 Matrix_1.6-1.1 lattice_0.22-5 truncnorm_1.0-9
[6] splines_4.3.2 parallel_4.3.2 grid_4.3.2 nleqslv_3.3.5 compiler_4.3.2
[11] quantreg_5.97 rstudioapi_0.15.0 tools_4.3.2 SparseM_1.81 cobs_1.3-7
[16] survival_3.5-7 Rcpp_1.0.12 MASS_7.3-60 Rsolnp_1.16 MatrixModels_0.5-3
Thanks for the detailed report. I went down the rabbit hole to track the errors, of which there were far too many to my taste...
- The function
gpd.ll
andgpd.ll.optim
computed a different formula for$\xi \ge -1$ , despite the fact there was a special case afterwards forxi = -1
ingpd.ll
which was never executed. -
fit.gpd
now returns a value based ongpd.ll
for slotnllh
, fixing a case with Grimshaw's algorithm (due to a typo when assigning to listtmp
rather thantemp
that lead toNaN
values and failure of the profile likelihood routine). - The profile likelihood function has been modified to handle cases where the shape is on the boundary, so the returned grid of
psi
values at which to compute the profile is admissible and contains no missing values. Since in this case the confidence interval is one-sided, the calculation is now done manually to prevent spurious error messages in smoothing spline routines. Confidence intervals are also truncated to the finite range of value (since the confidence intervals are obtained by extrapolation if the grid of values ofpsi
is not wide enough). - The function now strips (without warning) any threshold that is too far into the tail. You had an instance in your example where the sample size was zero, so this would have thrown yet another error message.
- If all estimates for the modified scale/shape in the parameter stability plot are constant, this returns a warning.
- The
plot
S3 method for objects of classmev_tstab.gpd
now stripsNA
values from calculation of the limits, preventing error messages when there are missing values for the confidence intervals (which happens, e.g., for Wald method whenever the estimated shape$\xi < -0.5$ ).
Nice, thanks for the quick fix, and glad I could highlight some issues!