[Feature]: Random Seed for Reproducibility
cmougan opened this issue · 5 comments
Feature Request
For a paper, I am trying to make my code reproducible. I have set python random seed. Still every time it runs it provides different results.
Is it posslbe to set a random seed? I cant find it in the docs.
# %%
import numpy as np
import pandas as pd
from pysr import PySRRegressor
# Random seed
np.random.seed(42)
data_dict = {
100: {
"x_data": [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
"means": [0.0, 0.2625, 0.7275, 0.86, 0.9125, 0.94, 0.9325, 0.95, 0.9624999999999999, 0.9775]
},
1000: {
"x_data": [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180],
"means": [0.0, 0.0, 0.115, 0.6116666666666667, 0.855, 0.9550000000000001, 0.97, 0.9525, 0.9624999999999999, 0.94, 0.9724999999999999, 0.975, 0.985, 0.9875]
},
3162: {
"x_data": [10, 20, 30, 40, 50, 60, 70, 80, 90],
"means": [0.0, 0.0, 0.0025, 0.6375, 0.8925000000000001, 0.8925000000000001, 0.935, 0.9425, 0.98]
},
10000: {
"x_data": [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180],
"means": [0.0, 0.0, 0.0075, 0.28, 0.755, 0.9199999999999999, 0.8825000000000001, 0.9299999999999999, 0.955, 0.9624999999999999, 0.9775, 0.95, 0.9775, 0.985]
},
100000: {
"x_data": [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180],
"means": [0.0, 0.0, 0.0, 0.2525, 0.5575, 0.7050000000000001, 0.7525, 0.8175, 0.9325, 0.83, 0.93, 0.85, 0.94, 0.97]
}
}
# Creating and concatenating DataFrames
data_frames = []
for n, data in data_dict.items():
df = pd.DataFrame(data)
df["n"] = n
data_frames.append(df)
# Concatenate all DataFrames
df = pd.concat(data_frames)
df = df[df['x_data']>19]
df = df[df['x_data']<101]
# %%
X = df[["x_data", "n"]]
y = df["means"]
# %%
# Initialize and fit the symbolic regressor
model = PySRRegressor(
binary_operators=["*", "^","+", "/"],
unary_operators=['exp'],
# dimensional_constraint_penalty=10**5,
)
# %%
# Fit the model
model.fit(X, y)
model
Also, often the algorithm gets stuck in an iteration. Sometimes it takes 2 secs
You can use deterministic=True
in the PySRRegressor. See the docs here: https://astroautomata.com/PySR/api/
Thanks! I see its in the API, but its not in the detailed example https://astroautomata.com/PySR/
Yeah there are too many features to have them all in the detailed example and still have it be useful. But maybe in the Toy examples page it could be good? https://astroautomata.com/PySR/examples/
If you are interested it becoming a PySR contributor I'd welcome a PR adding a toy example to that page demonstrating deterministic settings :)
Thanks! We are writting a paper about finding scaling laws of LLMs. I am trying to use your work here. If everything goes all right I will be happy to write an example if you find it meaningful.