BUG: Results are not matched with ismev for MLE
toshiakiasakura opened this issue · 4 comments
Describe the bug
I read a book about extreme value statistics, and found this package. Then, I want to validate the same results are obtained or not.
The resutlts were finally not matched...
import pandas as pd
from pyextremes import EVA
KobeM = pd.Series([129.7, 51.4, 148.8, 76.3, 68.3, 91.8, 116.5, 97.4, 81.3, 65.8, 50.9, 100.2,
123.2, 199.4, 137.7, 104.7, 72.6, 63.2, 116.5, 71.9, 121.6, 87.8, 89.2, 76.2,
103.7, 112.0, 60.7, 63.2, 119.7, 75.5, 68.3, 67.6, 62.3, 51.1, 62.5, 99.4,
81.4, 86.3, 143.8, 44.5, 58.4, 270.4, 103.5, 94.0, 68.9, 80.0, 112.5, 73.4,
262.8, 58.1, 43.3, 59.0, 181.8, 117.1, 109.4, 115.9, 116.2, 175.8, 73.4, 83.7,
165.6, 56.8, 123.4, 142.2, 195.2, 130.1, 64.5, 97.1, 219.4, 95.7, 319.4, 72.5,
115.5, 53.0, 81.0, 152.0, 68.5, 83.5, 77.0, 77.5, 67.5, 91.0, 86.5, 77.0,
57.0, 125.5, 197.0, 105.5, 75.5, 128.0, 70.0, 150.0, 121.5, 72.0, 57.0, 81.0,
86.0, 40.5, 118.5, 76.5, 82.5, 122.0, 179.5, 123.0, 71.5, 38.5, 108.5, 138.0,
57.0, 76.5, 83.0, 47.0, 69.0, 105.5, 124.5, 64.0, 100.5])
KobeM.index = [pd.Timestamp(i,1,1) for i in range(1897, 2014)]
model = EVA(KobeM)
model.get_extremes(method="BM", block_size='365.2425D')
model.fit_model()
Univariate Extreme Value Analysis
========================================================================================
Source Data
----------------------------------------------------------------------------------------
Data label: None Size: 117
Start: January 1897 End: January 2013
========================================================================================
Extreme Values
----------------------------------------------------------------------------------------
Count: 116 Extraction method: BM
Type: high Block size: 365 days 05:49:12
========================================================================================
Model
----------------------------------------------------------------------------------------
Model: MLE Distribution: genextreme
Log-likelihood: -583.328 AIC: 1172.869
----------------------------------------------------------------------------------------
Free parameters: c=-0.192 Fixed parameters: All parameters are free
loc=78.353
scale=28.215
========================================================================================
R's ismev's code is this
library(ismev)
KobeM <- c(129.7, 51.4, 148.8, 76.3, 68.3, 91.8, 116.5, 97.4, 81.3, 65.8, 50.9, 100.2,
123.2, 199.4, 137.7, 104.7, 72.6, 63.2, 116.5, 71.9, 121.6, 87.8, 89.2, 76.2,
103.7, 112.0, 60.7, 63.2, 119.7, 75.5, 68.3, 67.6, 62.3, 51.1, 62.5, 99.4,
81.4, 86.3, 143.8, 44.5, 58.4, 270.4, 103.5, 94.0, 68.9, 80.0, 112.5, 73.4,
262.8, 58.1, 43.3, 59.0, 181.8, 117.1, 109.4, 115.9, 116.2, 175.8, 73.4, 83.7,
165.6, 56.8, 123.4, 142.2, 195.2, 130.1, 64.5, 97.1, 219.4, 95.7, 319.4, 72.5,
115.5, 53.0, 81.0, 152.0, 68.5, 83.5, 77.0, 77.5, 67.5, 91.0, 86.5, 77.0,
57.0, 125.5, 197.0, 105.5, 75.5, 128.0, 70.0, 150.0, 121.5, 72.0, 57.0, 81.0,
86.0, 40.5, 118.5, 76.5, 82.5, 122.0, 179.5, 123.0, 71.5, 38.5, 108.5, 138.0,
57.0, 76.5, 83.0, 47.0, 69.0, 105.5, 124.5, 64.0, 100.5)
k000 <- gev.fit(KobeM)
results are
$conv
[1] 0
$nllh
[1] 588.2654
$mle
[1] 77.8499914 28.1150960 0.1970488
$se
[1] 2.96183720 2.35426533 0.07809392
Expected behavior
I think number of iterations are not enough for this fitting.
Also, I wonder when estimating parameters, what kind of pacakges is used for parameter optimization.,,
Desktop (please complete the following information):
- OS : Linux
- Python version: 3.10.4
@toshiakiasakura these are the packages used in pyextremes: https://georgebv.github.io/pyextremes/#dependencies
Regarding your question, I see that you have some sample data for which you know expected results. Can you please point me to the source of that? Before telling you more I need to better understand the data and to see what model was used so that we can compare apples to apples.
Thank you for your reply! Since I am a totally newbie in this field, if I misunderstand or misinterpret something, please tell me.
Note: Sorry, I just wrongly pasted the results for python results. I updated the results in the first comment...
Since I fetched this example from the following Japanese textbook, I suppose you can not understand contents due to language problem... The author is Rinya Takahashi.
https://www.amazon.co.jp/%E6%A5%B5%E5%80%A4%E7%B5%B1%E8%A8%88%E5%AD%A6-ISM%E3%82%B7%E3%83%AA%E3%83%BC%E3%82%BA-%E9%80%B2%E5%8C%96%E3%81%99%E3%82%8B%E7%B5%B1%E8%A8%88%E6%95%B0%E7%90%86-%E9%AB%98%E6%A9%8B-%E5%80%AB%E4%B9%9F/dp/4764905159
In this book, the R example is shown, and I just reanalyzed the data as the book told me. So I just reshowed the results in the textbook.
The dataset is made up of maximum precipitation amount for each year from 1897 to 2013 in Kobe City (one of the cities in Japan), and we are interested in when and how large extreme rainfall will occur in the future. I was only given the maximum value of precipitation amount each year, so all data points were used for estimating parameters of GEV model.
If you have any further question, I try to answer...
The R code below were followed in the textbook to check model fitness, confidence interval for xi and return level by profile likelihood method, respectively.
gev.diag(k000)
gev.profxi(k000, xlow=0.05, xup=0.38, conf=0.95, nint=500)
gev.prof(k000, m=200, xlow=245, xup=565, conf=0.95, nint=500)
For optimization question, I just first imagined that parameter optimization was performed with scipy's minimize function, but I could not find minimize function call, and I just felt curious about that. However, from your code, maybe optimization is customly implemented? Am I right?
I have created a gist here: https://gist.github.com/georgebv/0329912d7a46628226f7558d124ae321
It looks like you didn't have correct number of extremes - if you look at your previous message, you have 117 points and 116 extremes. The reason for that is because you extract extremes from the data rather than setting it from extremes directly - can't blame you, it's not documented. In the gist above you can see that with all extreme values I am getting the same coefficients as in your R code.
Regarding your question about optimization, there are two models available in pyextremes: MLE and MCMC. The default one is MLE and it uses scipy:
- fitting is done using the
fit
public method of thescipy.stats.rv_continous
class: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.rv_continuous.fit.html - the
fit
method uses thefmin
function of thescipy.stats.optimize
package: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin.html
Sorry for my careless mistakes. I am very happy to see the same results.
Also, thanks for helpful comments for optimization procedures.