Framework for "scientific" experiments (Result organization; Experiment and Trial setup; Baseline Comparisons)
This tool helps strip out the repetitive parts of setting up and running experiments, and lets you focus on writing the logic of trial running and result types. This reduces the stupid errors one may make when running experiments, and makes results organized and gathering statistics convenient.
pip install sciex
In this framework, an Experiment
consists of a number of Trial
s. One trial is independent from another.
class Experiment:
"""One experiment simply groups a set of trials together.
Runs them together, manages results etc."""
def __init__(self, name, trials, outdir,
logging=True, verbose=False):
A Trial
can be initialized by a name
and a config
. There is no requirement on the format of config
; It just needs to be able to be serialized by pickle. Note that we recommend a convention of giving trial names:
class Trial:
def __init__(self, name, config, verbose=False):
"""
Trial name convention: "{trial-global-name}_{seed}_{specific-setting-name}"
Example: gridworld4x4_153_value-iteration-200.
The ``seed'' is optional. If not provided, then there should be only one underscore.
"""
Your job is to define a child class of Trial
, implementing its function run()
, so that it catersx to your experiment design.
We want to save time and run trials in parallel, if possible. Thus, instead of directly executing the trials, Experiment
first saves the trials as pickle
files in an organized manner, then generates shell scripts, each bundles a subset of all the trials. The shell script contains commands using trial_runner.py
to conduct trials. More specifically, the organized manner means, the pickle files will be saved under, along with trial_runner.py
, the shell scripts and gather_results.py
.
./{Experiment:outdir}
/{Experiment:name}_{timestamp}
/{Trial:name}
/trial.pkl
gather_results.py
run_{i}.sh
trial_runner.py
Inside the shell script run_{i}
where i
is the index of the bundle, you will only find commands of the sort:
python trial_runner.py {path/to/trial/trial.pkl} {path/to/trial} --logging
Thus, you just need to do
$ ./run_{i}.sh
on a terminal to execute the trials covered by this shell script. You can open multiple terminals and run all shell scripts together in parallel.
Divide run scripts by computer. This is useful for the case where you have generated a lot of running scripts (each runs multiple trials) and you want to add another level of grouping.
python -m sciex.divide ./ -c sunny windy rainy cloudy -r 0.3 0.2 0.3 0.2 -n 4 3 4 3
where ./
is the path to the experiment root directly (here I am just there),
sunny windy rainy cloudy
are four computers, and I want to run 30% run scripts
on sunny
, 20% on windy
, and so on. On each computer I will start e.g. 4
terminals
for sunny
and 3
for windy
, etc.
Run multiple trials with shared resource. The trials are contained
in a run script, or a file with a list of paths to trial pickle files.
The trial is expected to implement provide_shared_resource
and
could_provide_resource
and the resource should only be read, and not written to
by the trials. (NOTE this doesn't actually work with cuda models because of difficulty sharing CUDA memory)
python -m sciex.batch_runner run_script_or_file_with_trial_paths {Experiment:outdir}
We know that for different experiments, we may produce results of different type. For example, some times we have a list of values, some times a list of objects, sometimes a particular kind of object, and some times it is a combination of multiple result types. For instance, each trial in a image classification task may produce labels for test images. Yet each trial in image segmentation task may produce arrays of pixel locations. We want you to decide what those result types are and how to process them. Hence the Result
interface (see components.py
).
To be more concrete, in sciex
, each Trial
can produce multiple Result
s. Trials with the same specific_name
(see trial naming convention above) will be gathered (to compute a statistic). See the interface below:
class Result:
def save(self, path):
"""Save result to given path to file"""
raise NotImplemented
@classmethod
def collect(cls, path):
"""path can be a str of a list of paths"""
raise NotImplemented
@classmethod
def FILENAME(cls):
"""If this result depends on only one file,
put the filename here"""
raise NotImplemented
@classmethod
def gather(cls, results):
"""`results` is a mapping from specific_name to a dictionary {seed: actual_result}.
Returns a more understandable interpretation of these results"""
return None
@classmethod
def save_gathered_results(cls, results, path):
"""results is a mapping from global_name to the object returned by `gather()`.
Post-processing of results should happen here.
Return "None" if nothing to be saved."""
return None
...
Basically, you define how to save this kind of result (save()
), how to collect it, i.e. read it from a file (collect()
), how to gather a set of results of this kind (gather()
), for example, computing mean and standard deviation. As an example, sciex
provides a YamlResult
type (see result_types.py
):
import yaml
import pickle
from sciex.components import Result
class YamlResult(Result):
def __init__(self, things):
self._things = things
def save(self, path):
with open(path, "w") as f:
yaml.dump(self._things, f)
@classmethod
def collect(cls, path):
with open(path) as f:
return yaml.load(f)
We didn't define the gather()
and save_gathered_results()
functions because these are experiment-specific. For example, in a reinforcement learning experiment, I may want to gather rewards as a type of result. Here's how I may implement that. Notice that since I know I will put these results in a paper, my implementation of save_gathered_results
will be saving a LaTex table in a .tex
file.
class RewardsResult(YamlResult):
def __init__(self, rewards):
"""rewards: a list of reward floats"""
super().__init__(rewards)
@classmethod
def FILENAME(cls):
return "rewards.yaml"
@classmethod
def gather(cls, results):
"""`results` is a mapping from specific_name to a dictionary {seed: actual_result}.
Returns a more understandable interpretation of these results"""
# compute cumulative rewards
myresult = {}
for specific_name in results:
all_rewards = []
for seed in results[specific_name]:
cum_reward = sum(list(results[specific_name][seed]))
all_rewards.append(cum_reward)
myresult[specific_name] = {'mean': np.mean(all_rewards),
'std': np.std(all_rewards),
'_size': len(results[specific_name])}
return myresult
@classmethod
def save_gathered_results(cls, gathered_results, path):
def _tex_tab_val(entry, bold=False):
pm = "$\pm$" if not bold else "$\\bm{\pm}$"
return "%.2f %s %.2f" % (entry["mean"], pm, entry["std"])
# Save plain text
with open(os.path.join(path, "rewards.txt"), "w") as f:
pprint(gathered_results, stream=f)
# Save the latex table
with open(os.path.join(path, "rewards_latex.tex"), "w") as f:
tex =\
"\\begin{tabular}{ccccccc}\n%automatically generated table\n"\
" ... "
for global_name in gathered_results:
row = " %s & %s & %s & %s & %s & %s\\\\\n" % (...)
tex += row
tex += "\\end{tabular}"
f.write(tex)
return True
Then, after all the results are produced, when you want to gather the results and produce some report of the statistics (or plots), just run
$ ./{Experiment:outdir}/{Experiment:name}_{timestamp}/gather_results.py