MilesCranmer/SymbolicRegression.jl

[Feature]: Including Units Evaluating Model Performance and Controlling Iterative Models Example :MLJ

NK-Aero opened this issue ยท 10 comments

using MLJ
using DynamicQuantities
using SymbolicRegression

SRegressor = @load SRRegressor pkg=SymbolicRegression

X = (; x1=rand(32) .* us"km/h", x2=rand(32) .* us"km")
y = @. X.x2 / X.x1 + 0.5us"h"
model = SRRegressor(binary_operators=[+, -, *, /])
mach = machine(model, X, y)
fit!(mach)
y_hat = predict(mach, X)
# View the equation used:
r = report(mach)
println("Equation used:", r.equation_strings[r.best_idx])

e=evaluate!(mach, resampling=CV(), measure=rms)
println("RMS error on holdout set: ", e.measurements[1])

the error is "ERROR: DimensionError: 0.8940974674375995 h and -0.8940974674375931 have incompatible dimensions"

Sorry Iโ€™m not sure I understand the question. ustrip is indeed how you remove units.

Oh I think I understand now., sorry for being slow. The issue is that predict(mach, X) does not use the same units as the original y you passed, right? I guess the simplest thing is to store the units of y and then return them in predict?

@NK-Aero Could you please verify that #244 fixes this for you?

Owner

Thank you for your response. We need address two things. 1.Predicted values should have units. 2. if you have units loss functions in MLJ may not support. I think it needed to check. for example :evaluate(model, X, y, resampling=cv, measure=l2, verbosity=0), here, we need to cross validation , if you have units ,measure function may not work.
at this moment, Just, have removed units of "y" after training. cellulated losses manually since mlj loss functions are not supporting units. @MilesCranmer @ablaom

#244 should already fix things, no? For example:

julia> using MLJ, DynamicQuantities, SymbolicRegression

julia> X = (; x1=rand(32) .* us"km/h", x2=rand(32) .* us"km");

julia> y = @. X.x2 / X.x1 + 0.5us"h";

julia> model = SRRegressor(
           binary_operators=[+, -, *, /],
           dimensional_constraint_penalty=1000.0,
       );

julia> mach = machine(model, X, y);

julia> fit!(mach; verbosity=0);

julia> y_hat = predict(mach; rows=1:3)
3-element Vector{Quantity{Float64, SymbolicDimensions{DynamicQuantities.FixedRational{Int32, 25200}}}}:
 1.424939088669221 h
 1.5803991404563305 h
 1.0940808057538784 h

julia> r = report(mach);

julia> r.equation_strings[r.best_idx]
"((x2 / x1) + 0.5000000000000001)"

julia> e = evaluate!(mach, resampling=CV(), measure=rms)
PerformanceEvaluation object with these fields:
  measure, operation, measurement, per_fold,
  per_observation, fitted_params_per_fold,
  report_per_fold, train_test_rows
Extract:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€
โ”‚ measure                โ”‚ operation โ”‚ measurement   โ”‚ 1.96*SE โ”‚ p โ‹ฏ
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€
โ”‚ RootMeanSquaredError() โ”‚ predict   โ”‚ 3.65232e-16 h โ”‚ N/A     โ”‚ Q โ‹ฏ
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€
                                                    1 column omitted
julia> e.measurement
1-element Vector{Quantity{Float64, SymbolicDimensions{DynamicQuantities.FixedRational{Int32, 25200}}}}:
 3.6523169056447007e-16 h

Also, I have more suggestion , MLJ has a powerful feature "[Controlling model tuning] (https://alan-turing-institute.github.io/MLJ.jl/dev/controlling_iterative_models/#Controlling-model-tuning)"

SRRegressor will give advantage to find symbolic equation with units. but to find better models , I we need to use Controlling model tuning.
for examples

using MLJ

X, y = @load_boston;
RidgeRegressor = @load RidgeRegressor pkg=MLJLinearModels verbosity=0
model = RidgeRegressor()
r = range(model, :lambda, lower=-1, upper=2, scale=x->10^x)
self_tuning_model = TunedModel(model=model,
                               tuning=RandomSearch(rng=123),
                               resampling=CV(nfolds=6),
                               range=r,
                               measure=mae);
iterated_model = IteratedModel(model=self_tuning_model,
                               resampling=nothing,
                               control=[Step(1), NumberSinceBest(20), NumberLimit(1000)])
mach = machine(iterated_model, X, y)

here, instead of "tuning=RandomSearch(rng=123)", we need to tune "unary_operators=[cos, exp]," "nested_constraints: [sin => [cos => 0], cos => [cos => 2]]" . because, it is time consuming tast, we can start with simple unary_operators, nested_constraints , gradually we can make it complete if desired loss not achieving.

Thank you.

@NK-Aero I think SRRegressor is already compatible with model tuning. For your example you could use

r = range(model, :binary_operators; values=[[+], [+, -], [+, -, *]])

Or is there a specific thing you have tried that did not work?

I have update but nothing added to package. I'm not sure if need to do anything extra.
Im getting same error #244 should already fix things, no? For example:

e = evaluate!(mach, resampling=CV(), measure=rms)
ERROR: DimensionError: 2.484653406487582 h and -2.4846534064876034 have incompatible dimensions.

You can check out that PR's version with

]add https://github.com/MilesCranmer/SymbolicRegression.jl#MilesCranmer/issue241

and then restarting Julia

Thank you very much. #244 fixed. now, it is working.