Current performance evaluation objects, recently added to TunedModel histories, are too big
ablaom opened this issue · 2 comments
There's evidence that the recent addition of full PerformanceEvaluation
objects to TunedModel
histories is blowing up memory requirements in real use cases.
I propose that we create two PerformanceEvaluation
objects - a detailed one (as we have now) and new CompactPerformanceEvaluation
object. The evaluate
method get's a new keyword argument compact=false
and TunedModel
gets a new hyperparameter compact_history=true
(this default would technically break MLJTuning but I doubt this would effect more than one or two users - and the recent change is not actually documented anywhere yet.)
This would also allow us to ultimately address #575, which was shelved for fear of making evaluation objects too big.
Further thoughts anyone?
cc @CameronBieganek, @OkonSamuel
Below are the fields of the current struct. I've ticked off suggested fields for the compact case. I suppose the only one that might be controversial is observations_per_fold
. This was always included in TunedModel
histories previously, so it seems less disruptive to include it.
Fields
These fields are part of the public API of the PerformanceEvaluation
struct.
-
model
: model used to create the performance evaluation. In the case a
tuning model, this is the best model found. -
measure
: vector of measures (metrics) used to evaluate performance -
measurement
: vector of measurements - one for each element ofmeasure
- aggregating
the performance measurements over all train/test pairs (folds). The aggregation method
applied for a given measurem
is
StatisticalMeasuresBase.external_aggregation_mode(m)
(commonlyMean()
orSum()
) -
operation
(e.g.,predict_mode
): the operations applied for each measure to generate
predictions to be evaluated. Possibilities are: $PREDICT_OPERATIONS_STRING. -
per_fold
: a vector of vectors of individual test fold evaluations (one vector per
measure). Useful for obtaining a rough estimate of the variance of the performance
estimate. -
per_observation
: a vector of vectors of vectors containing individual per-observation
measurements: for an evaluatione
,e.per_observation[m][f][i]
is the measurement for
thei
th observation in thef
th test fold, evaluated using them
th measure. Useful
for some forms of hyper-parameter optimization. Note that an aggregregated measurement
for some measuremeasure
is repeated across all observations in a fold if
StatisticalMeasures.can_report_unaggregated(measure) == true
. Ife
has been computed
with theper_observation=false
option, thene_per_observation
is a vector of
missings
. -
fitted_params_per_fold
: a vector containingfitted params(mach)
for each machine
mach
trained during resampling - one machine per train/test pair. Use this to extract
the learned parameters for each individual training event. -
report_per_fold
: a vector containingreport(mach)
for each machinemach
training
in resampling - one machine per train/test pair. -
train_test_rows
: a vector of tuples, each of the form(train, test)
, wheretrain
andtest
are vectors of row (observation) indices for training and evaluation
respectively. -
resampling
: the resampling strategy used to generate the train/test pairs. -
repeats
: the number of times the resampling strategy was repeated.