[Feature]: frozen expressions with `saved_state`

Question

[Feature]: frozen expressions with `saved_state`

Opened this issue 2 months ago · 0 comments

Feature Request

For our use case involving frozen expressions with the saved_state parameter in equation_search, I initialized saved_state by modifying the return_state of the Hall of Fame to replace all expressions in all populations with the same expression, (3.0 / x1) + (5.0 * x2). After generating this customized saved_state, I utilized it to execute the final equation search. The following code outlines the steps performed for this process.

saved_s = equation_search(X, y, niterations=40, options=options, parallelism=:multithreading,return_state=true)
hall_of_fame = saved_s[2]
pop = saved_s[1]
Meta = get_metadata(hall_of_fame.members[1].tree)

member = PopMember(dataset, final_term, options; deterministic=false)

for i in 1:22
    hall_of_fame.members[i].tree = with_metadata(member.tree,Meta)
    hall_of_fame.members[i].loss = member.loss
    hall_of_fame.members[i].score = member.score
    hall_of_fame.exists[i] = true
end

for i in 1:33
    pop[1][1].members[i].tree = with_metadata(member.tree,Meta)
    pop[1][1].members[i].loss = member.loss
    pop[1][1].members[i].score = member.score
end

for i in 1:15
    pop[1][i] = pop[1][1]
end

hall_of_fame1 = equation_search(X, y, niterations=40, options=options, parallelism=:multithreading,saved_state = (pop,hall_of_fame))

For now, we're not modifying the metadata to determine which parts of the expression are frozen, but we thought we could still test the ability of the code to start from a predetermined expression, and only make certain kinds of changes. The issue arises during our test run in which we read in this saved_state. In our test, we want to modify only the node values without increasing or decreasing the number of nodes. To achieve this, we adjusted the mutation weights accordingly, which are as follows:

 mutate_constant::Float64 = 0.4
    mutate_operator::Float64 = 0.4
    swap_operands::Float64 = 0.4
    add_node::Float64 = 0.0
    insert_node::Float64 = 0.0
    delete_node::Float64 = 0.0
    simplify::Float64 = 0.0
    randomize::Float64 = 0.0
    do_nothing::Float64 = 0.0
    optimize::Float64 = 0.0
    form_connection::Float64 = 0.0
    break_connection::Float64 = 0.0

The output is accepting the saved_state, but the results are not as expected. Based on the mutation weights, the nodes should not be increasing. Could you please provide insight into the unusual behavior of the equation_search? Additionally, we used the same expression 33 times in the population and 22 times in the Hall of Fame within the saved_state, could this be causing confusion within PySR? Even though we've changed the mutation weights, and set most of them to zero, the Pareto Front looks the same as what we get when using default weights. Your assistance in resolving this issue would be greatly appreciated. Thank you.

Output:

I would like to associate the modifications made with the updated versions of SymbolicRegression.jl and DynamicExpression.jl to reflect the corresponding changes within these files.