[Feature]: Including Units Evaluating Model Performance and Controlling Iterative Models Example :MLJ

Question

[Feature]: Including Units Evaluating Model Performance and Controlling Iterative Models Example :MLJ

NK-Aero opened this issue a year ago · 10 comments

using MLJ
using DynamicQuantities
using SymbolicRegression

SRegressor = @load SRRegressor pkg=SymbolicRegression

X = (; x1=rand(32) .* us"km/h", x2=rand(32) .* us"km")
y = @. X.x2 / X.x1 + 0.5us"h"
model = SRRegressor(binary_operators=[+, -, *, /])
mach = machine(model, X, y)
fit!(mach)
y_hat = predict(mach, X)
# View the equation used:
r = report(mach)
println("Equation used:", r.equation_strings[r.best_idx])

e=evaluate!(mach, resampling=CV(), measure=rms)
println("RMS error on holdout set: ", e.measurements[1])

the error is "ERROR: DimensionError: 0.8940974674375995 h and -0.8940974674375931 have incompatible dimensions"

Answer 1 · 2023-08-02T12:33:09.000Z

Sorry I’m not sure I understand the question. ustrip is indeed how you remove units.

Answer 2 · 2023-08-05T16:30:40.000Z

Oh I think I understand now., sorry for being slow. The issue is that predict(mach, X) does not use the same units as the original y you passed, right? I guess the simplest thing is to store the units of y and then return them in predict?

Answer 3 · 2023-08-05T17:03:25.000Z

@NK-Aero Could you please verify that #244 fixes this for you?

Answer 4 · 2023-08-05T17:17:24.000Z

Owner

Thank you for your response. We need address two things. 1.Predicted values should have units. 2. if you have units loss functions in MLJ may not support. I think it needed to check. for example :evaluate(model, X, y, resampling=cv, measure=l2, verbosity=0), here, we need to cross validation , if you have units ,measure function may not work.
at this moment, Just, have removed units of "y" after training. cellulated losses manually since mlj loss functions are not supporting units. @MilesCranmer @ablaom

Answer 5 · 2023-08-05T17:26:40.000Z

#244 should already fix things, no? For example:

julia> using MLJ, DynamicQuantities, SymbolicRegression

julia> X = (; x1=rand(32) .* us"km/h", x2=rand(32) .* us"km");

julia> y = @. X.x2 / X.x1 + 0.5us"h";

julia> model = SRRegressor(
           binary_operators=[+, -, *, /],
           dimensional_constraint_penalty=1000.0,
       );

julia> mach = machine(model, X, y);

julia> fit!(mach; verbosity=0);

julia> y_hat = predict(mach; rows=1:3)
3-element Vector{Quantity{Float64, SymbolicDimensions{DynamicQuantities.FixedRational{Int32, 25200}}}}:
 1.424939088669221 h
 1.5803991404563305 h
 1.0940808057538784 h

julia> r = report(mach);

julia> r.equation_strings[r.best_idx]
"((x2 / x1) + 0.5000000000000001)"

julia> e = evaluate!(mach, resampling=CV(), measure=rms)
PerformanceEvaluation object with these fields:
  measure, operation, measurement, per_fold,
  per_observation, fitted_params_per_fold,
  report_per_fold, train_test_rows
Extract:
┌────────────────────────┬───────────┬───────────────┬─────────┬────
│ measure                │ operation │ measurement   │ 1.96*SE │ p ⋯
├────────────────────────┼───────────┼───────────────┼─────────┼────
│ RootMeanSquaredError() │ predict   │ 3.65232e-16 h │ N/A     │ Q ⋯
└────────────────────────┴───────────┴───────────────┴─────────┴────
                                                    1 column omitted
julia> e.measurement
1-element Vector{Quantity{Float64, SymbolicDimensions{DynamicQuantities.FixedRational{Int32, 25200}}}}:
 3.6523169056447007e-16 h

Answer 6 · 2023-08-05T17:29:28.000Z

Also, I have more suggestion , MLJ has a powerful feature "[Controlling model tuning] (https://alan-turing-institute.github.io/MLJ.jl/dev/controlling_iterative_models/#Controlling-model-tuning)"

SRRegressor will give advantage to find symbolic equation with units. but to find better models , I we need to use Controlling model tuning.
for examples

using MLJ

X, y = @load_boston;
RidgeRegressor = @load RidgeRegressor pkg=MLJLinearModels verbosity=0
model = RidgeRegressor()
r = range(model, :lambda, lower=-1, upper=2, scale=x->10^x)
self_tuning_model = TunedModel(model=model,
                               tuning=RandomSearch(rng=123),
                               resampling=CV(nfolds=6),
                               range=r,
                               measure=mae);
iterated_model = IteratedModel(model=self_tuning_model,
                               resampling=nothing,
                               control=[Step(1), NumberSinceBest(20), NumberLimit(1000)])
mach = machine(iterated_model, X, y)

here, instead of "tuning=RandomSearch(rng=123)", we need to tune "unary_operators=[cos, exp]," "nested_constraints: [sin => [cos => 0], cos => [cos => 2]]" . because, it is time consuming tast, we can start with simple unary_operators, nested_constraints , gradually we can make it complete if desired loss not achieving.

Thank you.

Answer 7 · 2023-08-05T17:37:54.000Z

@NK-Aero I think SRRegressor is already compatible with model tuning. For your example you could use

r = range(model, :binary_operators; values=[[+], [+, -], [+, -, *]])

Or is there a specific thing you have tried that did not work?

Answer 8 · 2023-08-05T17:41:39.000Z

I have update but nothing added to package. I'm not sure if need to do anything extra.
Im getting same error #244 should already fix things, no? For example:

e = evaluate!(mach, resampling=CV(), measure=rms)
ERROR: DimensionError: 2.484653406487582 h and -2.4846534064876034 have incompatible dimensions.

Answer 9 · 2023-08-05T17:45:43.000Z

You can check out that PR's version with

]add https://github.com/MilesCranmer/SymbolicRegression.jl#MilesCranmer/issue241

and then restarting Julia

Answer 10 · 2023-08-05T18:05:53.000Z

Thank you very much. #244 fixed. now, it is working.