alternative to block_size

First: Thanks for this package, I like it a lot (and it is much faster than my own implementation)!

My problem is ''block_size'':
If I am interested in annual maxima, it easily happens that the selected blocks traverse "hard boundaries" such as 31.12/01.01 (or 31.08/01.09 if I am interested in school years for instance). It could thus happen (rarely of course) that my annual maxima is attributed to the wrong year). This is of course related to the problem of leap years.

A worst-case scenario would probably be:
A maximum any time in e.g. 2020-12 and another "almost-maximum" at 2021-01-01 03:00. Then for the rest in 2021 no real high value (anywhere near the last two mentioned ones). It then could be that 2021-01-01 03:00 would be counted to the annual block of 2020 (and thus not appear in the extremes). However, I would really prefer to have the value of 2021-01-01 03:00 counted to 2021 and therefore provide another extreme value for 2021.

The date_time_intervals are constructed from my first element in the time series (ts) and the block_size. If I am aware of this (which I am), I can fill up my ts with 0's to my desired start of the year (not with nan's because they will be removed by pyextremes before building the date_time_intervals, and hence I would get even stranger year-periods).

My desired solution:
For my purpose, it would be nice to pass date_time_intervals to pyextremes (get_extremes_block_maxima) directly. This would allow me to have hard boundaries at years.

I could imagine this "problem" is even more severe if one looks at monthly blocks: an average block_size would constantly traverse hard month-boundaries.

Anyway, thanks again, and I would be interested to know if my block_size problem is worth to be considered.

I may be late, but, if you have a custom way you extract extreme values from your data, you can use the EVA.set_extremes method:

pyextremes/src/pyextremes/eva.py

Lines 497 to 546 in 8ec142b

    
               def set_extremes(self, extremes: pd.Series, **kwargs) -> None: 
        
                   """ 
        
                   Set extreme values. 
        
                   This method is used to set extreme values onto the model instead 
        
                   of deriving them from data directly using the 'get_extremes' method. 
        
                   This way user can set extremes calculated using a custom methodology. 
        
                   Parameters 
        
                   ---------- 
        
                   extremes : pd.Series 
        
                       Time series of extreme values to be set onto the model. 
        
                       Must be numeric, have date-time index, and have the same name 
        
                       as self.data. 
        
                   kwargs: 
        
                       method : str, optional 
        
                           Extreme value extraction method. 
        
                           Supported values: 
        
                               BM (default) - Block Maxima 
        
                               POT - Peaks Over Threshold 
        
                       extremes_type : str, optional 
        
                           high (default) - extreme high values 
        
                           low - extreme low values 
        
                       if method is BM: 
        
                           block_size : str or pandas.Timedelta, optional 
        
                               Block size. 
        
                               If None (default), then is calculated as median distance 
        
                               between extreme events. 
        
                           errors : str, optional 
        
                               raise - raise an exception 
        
                                   when encountering a block with no data 
        
                               ignore (default) - ignore blocks with no data 
        
                               coerce - get extreme values for blocks with no data 
        
                                   as mean of all other extreme events in the series 
        
                                   with index being the middle point of corresponding interval 
        
                           min_last_block : float, optional 
        
                               Minimum data availability ratio (0 to 1) in the last block 
        
                               for it to be used to extract extreme value from. 
        
                               This is used to discard last block when it is too short. 
        
                               If None (default), last block is always used. 
        
                       if method is POT: 
        
                           threshold : float, optional 
        
                               Threshold used to find exceedances. 
        
                               By default is taken as smallest value. 
        
                           r : pandas.Timedelta or value convertible to timedelta, optional 
        
                               Duration of window used to decluster the exceedances. 
        
                               By default r='24H' (24 hours). 
        
                               See pandas.to_timedelta for more information. 
        
                   """

This way you can extract extreme values yourself and then use them with pyextremes.

Many thanks for that hint!
Yes, that would work perfectly fine for me. Sorry, I didn't see this option!

	def set_extremes(self, extremes: pd.Series, **kwargs) -> None:
	"""
	Set extreme values.

	This method is used to set extreme values onto the model instead
	of deriving them from data directly using the 'get_extremes' method.
	This way user can set extremes calculated using a custom methodology.

	Parameters
	----------
	extremes : pd.Series
	Time series of extreme values to be set onto the model.
	Must be numeric, have date-time index, and have the same name
	as self.data.
	kwargs:
	method : str, optional
	Extreme value extraction method.
	Supported values:
	BM (default) - Block Maxima
	POT - Peaks Over Threshold
	extremes_type : str, optional
	high (default) - extreme high values
	low - extreme low values
	if method is BM:
	block_size : str or pandas.Timedelta, optional
	Block size.
	If None (default), then is calculated as median distance
	between extreme events.
	errors : str, optional
	raise - raise an exception
	when encountering a block with no data
	ignore (default) - ignore blocks with no data
	coerce - get extreme values for blocks with no data
	as mean of all other extreme events in the series
	with index being the middle point of corresponding interval
	min_last_block : float, optional
	Minimum data availability ratio (0 to 1) in the last block
	for it to be used to extract extreme value from.
	This is used to discard last block when it is too short.
	If None (default), last block is always used.
	if method is POT:
	threshold : float, optional
	Threshold used to find exceedances.
	By default is taken as smallest value.
	r : pandas.Timedelta or value convertible to timedelta, optional
	Duration of window used to decluster the exceedances.
	By default r='24H' (24 hours).
	See pandas.to_timedelta for more information.

	"""