BUG: Results are not matched with ismev for MLE

Question

BUG: Results are not matched with ismev for MLE

toshiakiasakura opened this issue 2 years ago · 4 comments

Describe the bug
I read a book about extreme value statistics, and found this package. Then, I want to validate the same results are obtained or not.
The resutlts were finally not matched...

import pandas as pd
from pyextremes import EVA
KobeM = pd.Series([129.7,  51.4, 148.8,  76.3,  68.3,  91.8, 116.5,  97.4,  81.3,  65.8,  50.9, 100.2,
  123.2, 199.4, 137.7, 104.7,  72.6,  63.2, 116.5,  71.9, 121.6,  87.8,  89.2,  76.2,
  103.7, 112.0,  60.7,  63.2, 119.7,  75.5,  68.3,  67.6,  62.3,  51.1,  62.5,  99.4,
   81.4,  86.3, 143.8,  44.5,  58.4, 270.4, 103.5,  94.0,  68.9,  80.0, 112.5,  73.4,
  262.8,  58.1,  43.3,  59.0, 181.8, 117.1, 109.4, 115.9, 116.2, 175.8,  73.4,  83.7,
  165.6,  56.8, 123.4, 142.2, 195.2, 130.1,  64.5,  97.1, 219.4,  95.7, 319.4,  72.5,
  115.5,  53.0,  81.0, 152.0,  68.5,  83.5,  77.0,  77.5,  67.5,  91.0,  86.5,  77.0,
   57.0, 125.5, 197.0, 105.5,  75.5, 128.0,  70.0, 150.0, 121.5,  72.0,  57.0,  81.0,
   86.0,  40.5, 118.5,  76.5,  82.5, 122.0, 179.5, 123.0,  71.5,  38.5, 108.5, 138.0,
   57.0,  76.5,  83.0,  47.0,  69.0, 105.5, 124.5,  64.0, 100.5])

KobeM.index = [pd.Timestamp(i,1,1) for i in range(1897, 2014)]

model = EVA(KobeM)

model.get_extremes(method="BM", block_size='365.2425D')

model.fit_model()


                           Univariate Extreme Value Analysis                            
========================================================================================
                                      Source Data                                       
----------------------------------------------------------------------------------------
Data label:                          None      Size:                                 117
Start:                       January 1897      End:                         January 2013
========================================================================================
                                     Extreme Values                                     
----------------------------------------------------------------------------------------
Count:                                116      Extraction method:                     BM
Type:                                high      Block size:             365 days 05:49:12
========================================================================================
                                         Model                                          
----------------------------------------------------------------------------------------
Model:                                MLE      Distribution:                  genextreme
Log-likelihood:                  -583.328      AIC:                             1172.869
----------------------------------------------------------------------------------------
Free parameters:                 c=-0.192      Fixed parameters: All parameters are free
                               loc=78.353                                               
                             scale=28.215                                               
========================================================================================

R's ismev's code is this

library(ismev)
KobeM <- c(129.7,  51.4, 148.8,  76.3,  68.3,  91.8, 116.5,  97.4,  81.3,  65.8,  50.9, 100.2,
  123.2, 199.4, 137.7, 104.7,  72.6,  63.2, 116.5,  71.9, 121.6,  87.8,  89.2,  76.2,
  103.7, 112.0,  60.7,  63.2, 119.7,  75.5,  68.3,  67.6,  62.3,  51.1,  62.5,  99.4,
   81.4,  86.3, 143.8,  44.5,  58.4, 270.4, 103.5,  94.0,  68.9,  80.0, 112.5,  73.4,
  262.8,  58.1,  43.3,  59.0, 181.8, 117.1, 109.4, 115.9, 116.2, 175.8,  73.4,  83.7,
  165.6,  56.8, 123.4, 142.2, 195.2, 130.1,  64.5,  97.1, 219.4,  95.7, 319.4,  72.5,
  115.5,  53.0,  81.0, 152.0,  68.5,  83.5,  77.0,  77.5,  67.5,  91.0,  86.5,  77.0,
   57.0, 125.5, 197.0, 105.5,  75.5, 128.0,  70.0, 150.0, 121.5,  72.0,  57.0,  81.0,
   86.0,  40.5, 118.5,  76.5,  82.5, 122.0, 179.5, 123.0,  71.5,  38.5, 108.5, 138.0,
   57.0,  76.5,  83.0,  47.0,  69.0, 105.5, 124.5,  64.0, 100.5)
k000 <- gev.fit(KobeM)

results are

$conv
[1] 0
$nllh
[1] 588.2654
$mle
[1] 77.8499914 28.1150960  0.1970488
$se
[1] 2.96183720 2.35426533 0.07809392

Expected behavior
I think number of iterations are not enough for this fitting.
Also, I wonder when estimating parameters, what kind of pacakges is used for parameter optimization.,,

Desktop (please complete the following information):

OS : Linux
Python version: 3.10.4

Answer 1 · 2022-06-25T15:29:18.000Z

@toshiakiasakura these are the packages used in pyextremes: https://georgebv.github.io/pyextremes/#dependencies

Regarding your question, I see that you have some sample data for which you know expected results. Can you please point me to the source of that? Before telling you more I need to better understand the data and to see what model was used so that we can compare apples to apples.

Answer 2 · 2022-06-26T13:45:35.000Z

Thank you for your reply! Since I am a totally newbie in this field, if I misunderstand or misinterpret something, please tell me.
Note: Sorry, I just wrongly pasted the results for python results. I updated the results in the first comment...

Since I fetched this example from the following Japanese textbook, I suppose you can not understand contents due to language problem... The author is Rinya Takahashi.
https://www.amazon.co.jp/%E6%A5%B5%E5%80%A4%E7%B5%B1%E8%A8%88%E5%AD%A6-ISM%E3%82%B7%E3%83%AA%E3%83%BC%E3%82%BA-%E9%80%B2%E5%8C%96%E3%81%99%E3%82%8B%E7%B5%B1%E8%A8%88%E6%95%B0%E7%90%86-%E9%AB%98%E6%A9%8B-%E5%80%AB%E4%B9%9F/dp/4764905159

In this book, the R example is shown, and I just reanalyzed the data as the book told me. So I just reshowed the results in the textbook.

The dataset is made up of maximum precipitation amount for each year from 1897 to 2013 in Kobe City (one of the cities in Japan), and we are interested in when and how large extreme rainfall will occur in the future. I was only given the maximum value of precipitation amount each year, so all data points were used for estimating parameters of GEV model.

If you have any further question, I try to answer...
The R code below were followed in the textbook to check model fitness, confidence interval for xi and return level by profile likelihood method, respectively.

gev.diag(k000)
gev.profxi(k000, xlow=0.05, xup=0.38, conf=0.95, nint=500)
gev.prof(k000, m=200, xlow=245, xup=565, conf=0.95, nint=500)

For optimization question, I just first imagined that parameter optimization was performed with scipy's minimize function, but I could not find minimize function call, and I just felt curious about that. However, from your code, maybe optimization is customly implemented? Am I right?

Answer 3 · 2022-06-26T17:55:36.000Z

I have created a gist here: https://gist.github.com/georgebv/0329912d7a46628226f7558d124ae321

It looks like you didn't have correct number of extremes - if you look at your previous message, you have 117 points and 116 extremes. The reason for that is because you extract extremes from the data rather than setting it from extremes directly - can't blame you, it's not documented. In the gist above you can see that with all extreme values I am getting the same coefficients as in your R code.

Regarding your question about optimization, there are two models available in pyextremes: MLE and MCMC. The default one is MLE and it uses scipy:

fitting is done using the fit public method of the scipy.stats.rv_continous class: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.fit.html
the fit method uses the fmin function of the scipy.stats.optimize package: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin.html

Answer 4 · 2022-06-27T04:10:14.000Z

Sorry for my careless mistakes. I am very happy to see the same results.
Also, thanks for helpful comments for optimization procedures.