JuliaAI/MLJ.jl

Current performance evaluation objects, recently added to TunedModel histories, are too big

ablaom opened this issue · 2 comments

There's evidence that the recent addition of full PerformanceEvaluation objects to TunedModel histories is blowing up memory requirements in real use cases.

I propose that we create two PerformanceEvaluation objects - a detailed one (as we have now) and new CompactPerformanceEvaluation object. The evaluate method get's a new keyword argument compact=false and TunedModel gets a new hyperparameter compact_history=true (this default would technically break MLJTuning but I doubt this would effect more than one or two users - and the recent change is not actually documented anywhere yet.)

This would also allow us to ultimately address #575, which was shelved for fear of making evaluation objects too big.

Further thoughts anyone?

cc @CameronBieganek, @OkonSamuel

Below are the fields of the current struct. I've ticked off suggested fields for the compact case. I suppose the only one that might be controversial is observations_per_fold. This was always included in TunedModel histories previously, so it seems less disruptive to include it.

Fields

These fields are part of the public API of the PerformanceEvaluation struct.

  • model: model used to create the performance evaluation. In the case a
    tuning model, this is the best model found.

  • measure: vector of measures (metrics) used to evaluate performance

  • measurement: vector of measurements - one for each element of measure - aggregating
    the performance measurements over all train/test pairs (folds). The aggregation method
    applied for a given measure m is
    StatisticalMeasuresBase.external_aggregation_mode(m) (commonly Mean() or Sum())

  • operation (e.g., predict_mode): the operations applied for each measure to generate
    predictions to be evaluated. Possibilities are: $PREDICT_OPERATIONS_STRING.

  • per_fold: a vector of vectors of individual test fold evaluations (one vector per
    measure). Useful for obtaining a rough estimate of the variance of the performance
    estimate.

  • per_observation: a vector of vectors of vectors containing individual per-observation
    measurements: for an evaluation e, e.per_observation[m][f][i] is the measurement for
    the ith observation in the fth test fold, evaluated using the mth measure. Useful
    for some forms of hyper-parameter optimization. Note that an aggregregated measurement
    for some measure measure is repeated across all observations in a fold if
    StatisticalMeasures.can_report_unaggregated(measure) == true. If e has been computed
    with the per_observation=false option, then e_per_observation is a vector of
    missings.

  • fitted_params_per_fold: a vector containing fitted params(mach) for each machine
    mach trained during resampling - one machine per train/test pair. Use this to extract
    the learned parameters for each individual training event.

  • report_per_fold: a vector containing report(mach) for each machine mach training
    in resampling - one machine per train/test pair.

  • train_test_rows: a vector of tuples, each of the form (train, test), where train
    and test are vectors of row (observation) indices for training and evaluation
    respectively.

  • resampling: the resampling strategy used to generate the train/test pairs.

  • repeats: the number of times the resampling strategy was repeated.

Also relevant: #1025