[Feature]: Including Units Evaluating Model Performance and Controlling Iterative Models Example :MLJ
NK-Aero opened this issue ยท 10 comments
using MLJ
using DynamicQuantities
using SymbolicRegression
SRegressor = @load SRRegressor pkg=SymbolicRegression
X = (; x1=rand(32) .* us"km/h", x2=rand(32) .* us"km")
y = @. X.x2 / X.x1 + 0.5us"h"
model = SRRegressor(binary_operators=[+, -, *, /])
mach = machine(model, X, y)
fit!(mach)
y_hat = predict(mach, X)
# View the equation used:
r = report(mach)
println("Equation used:", r.equation_strings[r.best_idx])
e=evaluate!(mach, resampling=CV(), measure=rms)
println("RMS error on holdout set: ", e.measurements[1])
the error is "ERROR: DimensionError: 0.8940974674375995 h and -0.8940974674375931 have incompatible dimensions"
Sorry Iโm not sure I understand the question. ustrip
is indeed how you remove units.
Oh I think I understand now., sorry for being slow. The issue is that predict(mach, X)
does not use the same units as the original y
you passed, right? I guess the simplest thing is to store the units of y
and then return them in predict
?
Owner
Thank you for your response. We need address two things. 1.Predicted values should have units. 2. if you have units loss functions in MLJ may not support. I think it needed to check. for example :evaluate(model, X, y, resampling=cv, measure=l2, verbosity=0), here, we need to cross validation , if you have units ,measure function may not work.
at this moment, Just, have removed units of "y" after training. cellulated losses manually since mlj loss functions are not supporting units. @MilesCranmer @ablaom
#244 should already fix things, no? For example:
julia> using MLJ, DynamicQuantities, SymbolicRegression
julia> X = (; x1=rand(32) .* us"km/h", x2=rand(32) .* us"km");
julia> y = @. X.x2 / X.x1 + 0.5us"h";
julia> model = SRRegressor(
binary_operators=[+, -, *, /],
dimensional_constraint_penalty=1000.0,
);
julia> mach = machine(model, X, y);
julia> fit!(mach; verbosity=0);
julia> y_hat = predict(mach; rows=1:3)
3-element Vector{Quantity{Float64, SymbolicDimensions{DynamicQuantities.FixedRational{Int32, 25200}}}}:
1.424939088669221 h
1.5803991404563305 h
1.0940808057538784 h
julia> r = report(mach);
julia> r.equation_strings[r.best_idx]
"((x2 / x1) + 0.5000000000000001)"
julia> e = evaluate!(mach, resampling=CV(), measure=rms)
PerformanceEvaluation object with these fields:
measure, operation, measurement, per_fold,
per_observation, fitted_params_per_fold,
report_per_fold, train_test_rows
Extract:
โโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโฌโโโโโโโโโโฌโโโโ
โ measure โ operation โ measurement โ 1.96*SE โ p โฏ
โโโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโผโโโโโโโโโโผโโโโ
โ RootMeanSquaredError() โ predict โ 3.65232e-16 h โ N/A โ Q โฏ
โโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโโโโโดโโโโโโโโโโดโโโโ
1 column omitted
julia> e.measurement
1-element Vector{Quantity{Float64, SymbolicDimensions{DynamicQuantities.FixedRational{Int32, 25200}}}}:
3.6523169056447007e-16 h
Also, I have more suggestion , MLJ has a powerful feature "[Controlling model tuning] (https://alan-turing-institute.github.io/MLJ.jl/dev/controlling_iterative_models/#Controlling-model-tuning)"
SRRegressor will give advantage to find symbolic equation with units. but to find better models , I we need to use Controlling model tuning.
for examples
using MLJ
X, y = @load_boston;
RidgeRegressor = @load RidgeRegressor pkg=MLJLinearModels verbosity=0
model = RidgeRegressor()
r = range(model, :lambda, lower=-1, upper=2, scale=x->10^x)
self_tuning_model = TunedModel(model=model,
tuning=RandomSearch(rng=123),
resampling=CV(nfolds=6),
range=r,
measure=mae);
iterated_model = IteratedModel(model=self_tuning_model,
resampling=nothing,
control=[Step(1), NumberSinceBest(20), NumberLimit(1000)])
mach = machine(iterated_model, X, y)
here, instead of "tuning=RandomSearch(rng=123)", we need to tune "unary_operators=[cos, exp]," "nested_constraints: [sin => [cos => 0], cos => [cos => 2]]" . because, it is time consuming tast, we can start with simple unary_operators, nested_constraints , gradually we can make it complete if desired loss not achieving.
Thank you.
@NK-Aero I think SRRegressor is already compatible with model tuning. For your example you could use
r = range(model, :binary_operators; values=[[+], [+, -], [+, -, *]])
Or is there a specific thing you have tried that did not work?
I have update but nothing added to package. I'm not sure if need to do anything extra.
Im getting same error #244 should already fix things, no? For example:
e = evaluate!(mach, resampling=CV(), measure=rms)
ERROR: DimensionError: 2.484653406487582 h and -2.4846534064876034 have incompatible dimensions.
You can check out that PR's version with
]add https://github.com/MilesCranmer/SymbolicRegression.jl#MilesCranmer/issue241
and then restarting Julia