MilesCranmer/PySR

[Feature]: Random Seed for Reproducibility

cmougan opened this issue · 5 comments

Feature Request

For a paper, I am trying to make my code reproducible. I have set python random seed. Still every time it runs it provides different results.

Is it posslbe to set a random seed? I cant find it in the docs.

# %%
import numpy as np
import pandas as pd
from pysr import PySRRegressor

# Random seed
np.random.seed(42)

data_dict = {
    100: {
        "x_data": [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
        "means": [0.0, 0.2625, 0.7275, 0.86, 0.9125, 0.94, 0.9325, 0.95, 0.9624999999999999, 0.9775] 
    },
    1000: {
        "x_data": [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180],
        "means": [0.0, 0.0, 0.115, 0.6116666666666667, 0.855, 0.9550000000000001, 0.97, 0.9525, 0.9624999999999999, 0.94, 0.9724999999999999, 0.975, 0.985, 0.9875] 
    },
    3162: {
        "x_data": [10, 20, 30, 40, 50, 60, 70, 80, 90],
        "means": [0.0, 0.0, 0.0025, 0.6375, 0.8925000000000001, 0.8925000000000001, 0.935, 0.9425, 0.98] 
    },
    10000: {
        "x_data": [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180],
        "means": [0.0, 0.0, 0.0075, 0.28, 0.755, 0.9199999999999999, 0.8825000000000001, 0.9299999999999999, 0.955, 0.9624999999999999, 0.9775, 0.95, 0.9775, 0.985]
    },
    100000: {
        "x_data": [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180],
        "means": [0.0, 0.0, 0.0, 0.2525, 0.5575, 0.7050000000000001, 0.7525, 0.8175, 0.9325, 0.83, 0.93, 0.85, 0.94, 0.97] 
    }
}

# Creating and concatenating DataFrames
data_frames = []
for n, data in data_dict.items():
    df = pd.DataFrame(data)
    df["n"] = n
    data_frames.append(df)

# Concatenate all DataFrames
df = pd.concat(data_frames)

df = df[df['x_data']>19]
df = df[df['x_data']<101]
# %%
X = df[["x_data", "n"]]
y = df["means"]

# %%
# Initialize and fit the symbolic regressor
model = PySRRegressor(
    binary_operators=["*", "^","+", "/"],
    unary_operators=['exp'],
    # dimensional_constraint_penalty=10**5,
)
# %%
# Fit the model
model.fit(X, y)
model

Also, often the algorithm gets stuck in an iteration. Sometimes it takes 2 secs

You can use deterministic=True in the PySRRegressor. See the docs here: https://astroautomata.com/PySR/api/

Thanks! I see its in the API, but its not in the detailed example https://astroautomata.com/PySR/

Yeah there are too many features to have them all in the detailed example and still have it be useful. But maybe in the Toy examples page it could be good? https://astroautomata.com/PySR/examples/

If you are interested it becoming a PySR contributor I'd welcome a PR adding a toy example to that page demonstrating deterministic settings :)

Thanks! We are writting a paper about finding scaling laws of LLMs. I am trying to use your work here. If everything goes all right I will be happy to write an example if you find it meaningful.