Physical symbolic optimization (
Source code: WassimTenachi/PhySO
Documentation: physo.readthedocs.io
demo_light.mp4
Performances on the standard Feynman benchmark from SRBench) comprising 120 expressions from the Feynman Lectures on Physics against popular SR packages.
The package has been tested on:
- Linux
- OSX (ARM & Intel)
- Windows
To install the package it is recommended to first create a conda virtual environment:
conda create -n PhySO python=3.8
And activate it:
conda activate PhySO
physo
can be downloaded using git:
git clone https://github.com/WassimTenachi/PhySO
Or by direct downloading a zip of the repository: here
From the repository root:
Installing essential dependencies :
conda install --file requirements.txt
Installing optional dependencies (for advanced debugging in tree representation) :
conda install --file requirements_display1.txt
pip install -r requirements_display2.txt
In addition, latex should be installed on the system.
NOTE : physo
supports CUDA but it should be noted that since the bottleneck of the code is free constant optimization, using CUDA (even on a very high-end GPU) does not improve performances over a CPU and can actually hinder performances.
Installing physo
(from the repository root):
pip install -e .
Import test:
python3
>>> import physo
This should result in physo
being successfully imported.
Unit tests:
Running all unit tests except parallel mode ones (from the repository root):
python -m unittest discover -p "*UnitTest.py"
This should result in all tests being successfully passed (except for program_display_UnitTest tests if optional dependencies were not installed).
Running all unit tests (from the repository root):
python -m unittest discover -p "*Test.py"
This should take 5-15 min depending on your system (as if you have a lot of CPU cores, it will take longer to make the efficiency curves).
Uninstalling the package.
conda deactivate
conda env remove -n PhySO
Symbolic regression (SR) consists in the inference of a free-form symbolic analytical function
Given a dataset
import numpy as np
z = np.random.uniform(-10, 10, 50)
v = np.random.uniform(-10, 10, 50)
X = np.stack((z, v), axis=0)
y = 1.234*9.807*z + 1.234*v**2
Where
Given the units input variables
import physo
expression, logs = physo.SR(X, y,
X_units = [ [1, 0, 0] , [1, -1, 0] ],
y_units = [2, -2, 1],
fixed_consts = [ 1. ],
fixed_consts_units = [ [0,0,0] ],
free_consts_units = [ [0, 0, 1] , [1, -2, 0] ],
)
(Allowing the use of a fixed constant
It should be noted that here the units vector are of size 3 (eg: [1, 0, 0]
) as in this example the variables have units dependent on length, time and mass only.
However, units vectors can be of any size
NOTE : If you are not working on a physics problem and all your variables/constants are dimensionless, do not specify any of the xx_units
arguments (or specify them as [0,0]
for all variables/constants) and physo
will perform a dimensionless symbolic regression task.
It should also be noted that free constants search starts around 1. by default. Therefore when using default hyperparameters, normalizing the data around an order of magnitude of 1 is strongly recommended.
Finally, please note that SR capabilities of physo
are heavily dependent on hyperparameters, it is therefore recommended to tune hyperparameters to your own specific problem for doing science.
Summary of currently available hyperparameters configurations:
Configuration | Notes |
---|---|
config0 | Light config for demo purposes. |
config1 | Tuned on a few physical cases. |
config2 | [work in progress] Good starting point for doing science. |
By default, config0
is used, however it is recommended to use the latest configuration currently available (config1
) as a starting point for doing science by specifying it:
expression, logs = physo.SR(X, y,
X_units = [ [1, 0, 0] , [1, -1, 0] ],
y_units = [2, -2, 1],
fixed_consts = [ 1. ],
fixed_consts_units = [ [0,0,0] ],
free_consts_units = [ [0, 0, 1] , [1, -2, 0] ],
run_config = physo.config.config1.config1
)
You can also specify the choosable symbolic operations for the construction of
expression, logs = physo.SR(X, y,
X_names = [ "z" , "v" ],
X_units = [ [1, 0, 0] , [1, -1, 0] ],
y_name = "E",
y_units = [2, -2, 1],
fixed_consts = [ 1. ],
fixed_consts_units = [ [0,0,0] ],
free_consts_names = [ "m" , "g" ],
free_consts_units = [ [0, 0, 1] , [1, -2, 0] ],
op_names = ["mul", "add", "sub", "div", "inv", "n2", "sqrt", "neg", "exp", "log", "sin", "cos"]
)
physo.SR
saves monitoring curves, the pareto front (complexity vs accuracy optimums) and the logs.
It also returns the best fitting expression found during the search which can be inspected in regular infix notation (eg. in ascii or latex) via:
>>> print(expression.get_infix_pretty(do_simplify=True))
⎛ 2⎞
m⋅⎝g⋅z + v ⎠
>>> print(expression.get_infix_latex(do_simplify=True))
'm \\left(g z + v^{2}\\right)'
Free constants can be inspected via:
>>> print(expression.free_const_values.cpu().detach().numpy())
array([9.80699996, 1.234 ])
physo.SR
also returns the log of the run from which one can inspect Pareto front expressions:
for i, prog in enumerate(pareto_front_expressions):
# Showing expression
print(prog.get_infix_pretty(do_simplify=True))
# Showing free constant
free_consts = prog.free_const_values.detach().cpu().numpy()
for j in range (len(free_consts)):
print("%s = %f"%(prog.library.free_const_names[j], free_consts[j]))
# Showing RMSE
print("RMSE = {:e}".format(pareto_front_rmse[i]))
print("-------------")
Returning:
2
m⋅v
g = 1.000000
m = 1.486251
RMSE = 6.510109e+01
-------------
g⋅m⋅z
g = 3.741130
m = 3.741130
RMSE = 5.696636e+01
-------------
⎛ 2⎞
m⋅⎝g⋅z + v ⎠
g = 9.807000
m = 1.234000
RMSE = 1.675142e-07
-------------
In addition, physo.SR
can create log files which can be inspected later on.
Pareto front expressions and their constants can be loaded into a list of sympy expressions via:
sympy_expressions = physo.read_pareto_csv("SR_curves_pareto.csv")
This demo can be found in here.
@ARTICLE{2023arXiv230303192T,
author = {{Tenachi}, Wassim and {Ibata}, Rodrigo and {Diakogiannis}, Foivos I.},
title = "{Deep symbolic regression for physics guided by units constraints: toward the automated discovery of physical laws}",
journal = {arXiv e-prints},
keywords = {Astrophysics - Instrumentation and Methods for Astrophysics, Computer Science - Machine Learning, Physics - Computational Physics},
year = 2023,
month = mar,
eid = {arXiv:2303.03192},
pages = {arXiv:2303.03192},
doi = {10.48550/arXiv.2303.03192},
archivePrefix = {arXiv},
eprint = {2303.03192},
primaryClass = {astro-ph.IM},
adsurl = {https://ui.adsabs.harvard.edu/abs/2023arXiv230303192T},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}