geomagpy/magpy

GUI can't take into account baselines with jumps (multiple fitting parts)

stephanbracke opened this issue · 1 comments

If baseline fit is spread out in different parts (more then one fitting) the graphical userinterface doesn't take correctly notice of these parts. One of the big problems here is that internally the way the fit is assigned not always done the same way (different when loaded from a file or directly in memory.
Lets take the scenario when we load a file (only magpy format ). Here the different parts are stored in the header of the abstream under a key 'DataFunctionObject' the structure of storage is

[[{'fdx': <scipy.interpolate._interpolate.interp1d object at 0x7fb57437a220>,
 'fdy': <scipy.interpolate._interpolate.interp1d object at 0x7fb57437ab30>,
 'fdz': <scipy.interpolate._interpolate.interp1d object at 0x7fb57437acc0>}, 
18940.346550925926, 18999.379305555554], 
[{'fdx': <scipy.interpolate._interpolate.interp1d object at 0x7fb57437aef0>,
 'fdy': <scipy.interpolate._interpolate.interp1d object at 0x7fb57438e0e0>, 
'fdz': <scipy.interpolate._interpolate.interp1d object at 0x7fb57438e2c0>}, 
19003.53148148148, 19121.302916666667]]

This DataFunctionObject stores the dx,dy,dz fits with start and end time expressed in days. However this is the only place where the different parts are stored but this is never used when invoking baseline fitting. In reality this is what coded :

magpy/magpy/gui/magpy_gui.py

Lines 4928 to 4936 in 38eb6e9

self.options = dlg.options
starttime = dlg.starttime
endtime = dlg.endtime
basedict = dlg.selecteddict # tmpbasedict[0]
#idx = basedict.get('streamidx') #int(dlg.absstreamComboBox.GetValue().split(':')[0])
absstream = self.streamlist[int(basedict.get('streamidx'))]
fitfunc = basedict.get('function','spline')
knotstep = basedict.get('knotstep',0.3)
degree = basedict.get('degree',5)

Basically he gets it from a previous set options. When just loading a baseline cdf file it will fall into the default ones(always spline even if you did a mean or polynomial fit). Start and stop are aligned with min and max of the abstream and he will now redo a full spline fit without taking into account the previous made fit.
While trying to take the different parts into account I changed code into

           for func in absstream.header['DataFunctionObject']:
                start =  num2date(func[1]).replace(tzinfo=None)
                end = num2date(func[2]).replace(tzinfo=None)
                self.plotstream.baseline(absstream, fitfunc='spline', knotstep=float(knotstep), fitdegree=int(degree),
                                         startabs=start, endabs=end,extradays=0)

This will take into account the different parts (and counter baseline jumps) but I missed information for each fit I don't have the fitfunction string the fitdegree or knotstep because the functions are saved as scipy objects and this high level info isn't available anymore.
I need them because the method in stream.py demands this info.

magpy/magpy/stream.py

Lines 2227 to 2252 in 38eb6e9

def baseline(self, absolutedata, **kwargs):
"""
DESCRIPTION:
calculates baseline correction for input stream (datastream)
Uses available baseline values from the provided absolute file
Special cases:
1) Absolute data covers the full time range of the stream:
-> Absolute data is extrapolated by duplicating the last and first entry at "extradays" offset
-> desired function is calculated
2) No Absolute data for the end of the stream:
-> like 1: Absolute data is extrapolated by duplicating the last entry at "extradays" offset or end of stream
-> and info message is created, if timedifference exceeds the "extraday" arg then a warning will be send
2) No Absolute data for the beginning of the stream:
-> like 2: Absolute data is extrapolated by duplicating the first entry at "extradays" offset or beginning o stream
-> and info message is created, if timedifference exceeds the "extraday" arg then a warning will be send
VARIABLES:
required:
didata (DataStream) containing DI data- usually obtained by absolutes.absoluteAnalysis()
keywords:
plotbaseline (bool/string) will plot a baselineplot (if a valid path is provided
to file otherwise to to screen- requires mpplot
extradays (int) days to which the absolutedata is exteded prior and after start and endtime
##plotfilename (string) if plotbaseline is selected, the outputplot is send to this file
fitfunc (string) see fit
fitdegree (int) see fit

It would probably be better to extend or change the objectlist saved into the DataFunctionObject header

[[{'fdx': {funct:'spline','fitdegree':5,'knotstep=0.3},
 'fdy': {funct:'spline','fitdegree':5,'knotstep=0.3},
 'fdz': {funct:'spline','fitdegree':5,'knotstep=0.3}, 
18940.346550925926, 18999.379305555554], ...]

or with less impact but less future proof

[[{'fdx': <scipy.interpolate._interpolate.interp1d object at 0x7fb57437a220>,
 'fdy': <scipy.interpolate._interpolate.interp1d object at 0x7fb57437ab30>,
 'fdz': <scipy.interpolate._interpolate.interp1d object at 0x7fb57437acc0>}, 
18940.346550925926, 18999.379305555554,'spline',5,0.3], ....

This header parameter DataFunctionObject should be set whenever you push on fit button so that working with stream in memory
will have the same behaviour and code functionality.

Furthermore I always have the impression that the button baseline and baselinecorr should be joined together to one click.

The recent updated for version 1.1 include solutions for all requested changes:

  • multiple functions are now supported and also shown in the information window
  • fit parameters can be saved and loaded (json file)
  • It is now possible to activate an option so that the adopted baseline is applied directly when using the baseline button
    All changes are finished with this commit (0c92d65)