Oríon Tutorial & Hackaton
Introduction
Why should you care...
(As Brady would say 😊)
... about Hyper-parameter optimization
Hyper-parameters are important. Researchers cannot make sound benchmarks without proper hyper-parameter optimization. We are all human, we love our research, we cannot be expected to spend as much time optimizing hyper-parameters of our benchmarks as much as we do on our beloved models. We need an automated tool which takes care of it impartially.
Hyper-parameters can be difficult to optimize. If your idea works, you don't want to fail just because you can't find the right hyper-parameter values. We need an automated tool which takes care of it more efficiently and more accurately.
... about Oríon
It is focusing on usability and provides an interface which is intuitive as well as flexible enough to support many different workflows.
It provides a platform for black-box optimization algorithms with the goal of building a community connecting together users of hyper-parameter optimization tools to developers of such algorithms.
It is open-source.
It's not a journaling system
We prefer to build tools with a lean design and specific goals. Oríon is not about saving data about your experiments. It saves the minimal amount necessary to do hyper-parameter optimization, that is, a single objective value.
If you are looking for a tool to do automatic journaling of your experiments then you may want to take a look at the one we are developing, kleio. Note that it is a prototype and do not follow a stable development with backward compatibility like Oríon.
Where to find us
Do not hesitate to ping us on slack, open issues, make feature requests or pull requests on github or even start discussions on our Discourse.
The team and emails can be found on our ICML 2018 RML workshop paper.
Setup
Installation
Oríon supports python 3.5, 3.6 and 3.7. Please don't run away if you still use python2.7, we can help you find redemption during this hackaton. 😊
On Mila computers, you can create a conda virtual environment with python 3.6.
$ conda create -n orion.tutorial python=3.6
...
Proceed ([y]/n)? y
...
$ source activate orion.tutorial
Make sure you have the right pip
. It should be under $HOME/.conda
by
default.
(orion.tutorial) [SLURM] bouthilx@leto05:~$ which pip
/u/bouthilx/.conda/envs/orion.tutorial/bin/pip
Core of Oríon
Installing Oríon is as simple as
$ pip install orion.core
Plugins
You can install plugins to add other algorithms or database backends to Oríon. For now the only plugin supported is a Bayesian Optimizer algorithm wrapped from scikit-optimize.
$ pip install orion.algo.skopt
Database configuration
Each system you use (Mila, Cedar, Graham) should have their own configuration file.
Unix:
$HOME/.config/orion.core/orion_config.yaml
OSX:
$HOME/Library/Application Support/orion.core/orion_config.yaml
If not sure, you can run orion in verbose mode to see where it is looking for configuration files. Here is an example of the output on Ubuntu
$ orion -vv --debug hunt -n dummy
DEBUG:orion.core.io.resolve_config:[Errno 2] No such file or directory: '/home/bouthilx/.virtualenvs3.6/orion.list/share/orion.core/orion_config.yaml.example'
DEBUG:orion.core.io.resolve_config:[Errno 2] No such file or directory: '/etc/xdg/xdg-ubuntu/orion.core/orion_config.yaml'
DEBUG:orion.core.io.resolve_config:[Errno 2] No such file or directory: '/home/bouthilx/.virtualenvs3.6/orion.list/share/orion.core/orion_config.yaml.example'
DEBUG:orion.core.io.resolve_config:[Errno 2] No such file or directory: '/etc/xdg/xdg-ubuntu/orion.core/orion_config.yaml'
DEBUG:orion.core.io.resolve_config:[Errno 2] No such file or directory: '/home/bouthilx/.config/orion.core/orion_config.yaml'
Once you selected your file path for the database configuration, you simply need to copy the following snippet and adapt the corresponding fields:
<USERNAME> | Same as your username on Mila computers |
<PASSWORD> | Your password for the database, not the one on Mila computers |
<DATABASENAME> | By default we assign database names which are your usernames. |
<CERTIFICATE-PATH> | On Mila computers it is ~lisa/mongo.pem .
If you use Oríon on other computers, you need to copy the certificate and adapt the configuration accordingly. |
database:
type: 'mongodb'
name: <DATABASENAME>
host: 'mongodb://<USERNAME>:<PASSWORD>@mongomila.iro.umontreal.ca:27017/<DATABASENAME>?ssl=true&ssl_ca_certs=<CERTIFICATE-PATH>&authSource=<DATABASENAME>'
Test connection
You can first check that everything works as expected by testing with the
debug
mode. This mode bypass the database in the configuration. If you run
the following command, you should get the following error.
$ orion --debug hunt -n dummy
...
AttributeError: 'str' object has no attribute 'configuration'
That's a terrible error message. -_- Note to ourselves; Improve this error message. What this should tell is that the connection to database was successful but Oríon could not find any script to optimize.
Now remove the option --debug
to test the database. If it fails to connect,
you will get the following error. Otherwise, you'll get the (terrible) error above again
if it succeeded. Note that a connection failure will hang for approximately 60
seconds before giving up. ☕ time.
$ orion hunt -n dummy
...
orion.core.io.database.DatabaseError: Connection Failure: database not found on specified uri
If it fails, try running with -vv
and make sure your configuration file is
properly found. Suppose your file path is /u/user/.config/orion.config/orion_config.yaml
, then you should NOT
see the following line in the output otherwise it means it is not found.
DEBUG:orion.core.io.resolve_config:[Errno 2] No such file or directory: '/u/user/.config/orion.config/orion_config.yaml'
When you are sure the configuration file is found, look for the configuration used by Oríon to initiate the DB connection.
DEBUG:orion.core.io.experiment_builder:Creating mongodb database client with args: {'name': 'bouthilx', 'host': 'mongodb://bouthilx:<PASSWORD>@mongomila.iro.umontreal.ca:27017/bouthilx?ssl=true&ssl_ca_certs=/u/lisa/mongo.pem&authSource=bouthilx'}
Make sure you have the proper database name, database type and host URI.
Usage
To help understand how to use Oríon, this tutorial will walk you through each
steps to adapt a script from pytorch/examples.git
. First to install
$ source activate orion.tutorial
$ pip install torch torchvision
$ git clone git@github.com:pytorch/examples.git
Adapting the script
There is a few modifications needed to be able to use the script with Oríon. First we need to make it executable.
$ cd examples/mnist
$ chmod +x main.py
Then we need to add a shebang at the top of main.py
. Note: We could drop those requirements in the
future.
#!/usr/bin/env python
At the top of the file, below the imports, import the helper function
orion.client.report_results()
from orion.client import report_results
We are almost done now. We need to add a line to the function test()
so
that it returns the error rate.
return 1 - (correct / len(test_loader.dataset))
And finally, we get back this test error rate and call report_results
to
return the objective value to Oríon. Note that report_results
is meant to
be called only once, this is because Oríon only optimizes looking at 1
objective value.
test_error_rate = test(args, model, device, test_loader)
report_results([dict(
name='test_error_rate',
type='objective',
value=test_error_rate)])
You can also return result types of 'gradient'
and 'constraint'
for
algorithms which supports those results as well.
Important note here, we use test error rate for sake of simplicity, because the script does not contain validation dataset loader as-is, but we should never optimize our hyper-parameters on the test set. We should always use a validation set.
Another important note, Oríon will always minimize the objective so make sure you never try to optimize something like the accuracy of the model unless you are looking for very very bad models.
Execution
Once the script is adapted, optimizing the hyper-parameters with Oríon is rather simple. Normally you would call the script the following way.
$ ./main.py --lr 0.01
To use it with Oríon, you simply need to prepend the call with
orion hunt -n <some name>
and specify the hyper-parameter prior
distributions.
$ orion hunt -n orion-tutorial ./main.py --lr~'loguniform(1e-5, 1.0)'
This commandline call will sequentially execute ./main.py --lr=<value>
with random
values sampled from the distribution loguniform(1e-5, 1.0)
. We support all
distributions from scipy.stats, plus choices()
for categorical
hyper-parameters (similar to numpy's choice function).
Experiments are interruptible, meaning that you can stop them either with
<ctrl-c>
or with kill signals. If your script is not resumable automatically then resuming an
experiment will restart your script from scratch.
You can resume experiments using the same commandline or simply by specifying the name of the experiment.
$ orion hunt -n orion-tutorial
Note that experiment names are unique, you cannot create two different experiment with the same name.
You can also register experiments without executing them.
$ orion init_only -n orion-tutorial ./main.py --lr~'loguniform(1e-5, 1.0)'
Debugging
When preparing a script for hyper-parameter optimization, we recommend first testing with debug
mode. This will use an in-memory database which will be flushed at the end of execution. If you
don't use --debug
you will likely quickly fill your database with broken experiments.
$ orion --debug hunt -n orion-tutorial ./main.py --lr~'loguniform(1e-5, 1.0)'
Hunting Options
$ orion hunt --help
Oríon arguments (optional):
These arguments determine orion's behaviour
-n stringID, --name stringID
experiment's unique name; (default: None - specified
either here or in a config)
-c path-to-config, --config path-to-config
user provided orion configuration file
--max-trials # number of jobs/trials to be completed (default:
inf/until preempted)
--pool-size # number of concurrent workers to evaluate candidate
samples (default: 10)
name
The unique name of the experiment.
config
Configuration file for Oríon which may define the database, the algorithm and all options of the
command hunt, including name
, pool-size
and max-trials
.
max-trials
The maximum number of trials tried during an experiment.
pool-size
The number of trials which are generated by the algorithm each time it is interrogated.
Results
When an experiment reaches its termination criterion, basically max-trials
, it will print the
following statistics if Oríon is called with -v
or -vv
.
RESULTS
=======
{'best_evaluation': 0.05289999999999995,
'best_trials_id': 'b7a741e70b75f074208942c1c2c7cd36',
'duration': datetime.timedelta(0, 49, 751548),
'finish_time': datetime.datetime(2018, 8, 30, 1, 8, 2, 562000),
'start_time': datetime.datetime(2018, 8, 30, 1, 7, 12, 810452),
'trials_completed': 5}
BEST PARAMETERS
===============
[{'name': '/lr', 'type': 'real', 'value': 0.012027705702344259}]
You can also fetch the results using python code. You do not need to understand MongoDB since
you can fetch results using the Experiment
object. The class ExperimentBuilder provides
simple methods to fetch experiments using their unique names. You do not need to explicitly
open a connection to the database when using the ExperimentBuilder since it will automatically
infer its configuration from the global configuration file as when calling Oríon in commandline.
Otherwise you can pass other arguments to ExperimentBuilder().build_view_from()
using the same
dictionary structure as in the configuration file.
# Database automatically inferred
ExperimentBuilder().build_view_from(
{"name": "orion-tutorial"})
# Database manually set
ExperimentBuilder().build_view_from(
{"name": "orion-tutorial",
"dataset": {
"type": "mongodb",
"name": "myother",
"host": "localhost"}})
For a complete example, here's how you can fetch trials from a given experiment.
import datetime
import pprint
from orion.core.io.experiment_builder import ExperimentBuilder
some_datetime = datetime.datetime.now() - datetime.timedelta(minutes=5)
experiment = ExperimentBuilder().build_view_from({"name": "orion-tutorial"})
pprint.pprint(experiment.stats)
for trial in experiment.fetch_trials({}):
print(trial.id)
print(trial.status)
print(trial.params)
print(trial.results)
print()
pprint.pprint(trial.to_dict())
# Fetches only the completed trials
for trial in experiment.fetch_trials({'status': 'completed'}):
print(trial.objective)
# Fetches only the most recent trials using mongodb-like syntax
for trial in experiment.fetch_trials({'end_time': {'$gte': some_datetime}}):
print(trial.id)
print(trial.end_time)
You can pass queries to fetch_trials()
, where queries can be simple dictionary of values to
match like {'status': 'completed'}
, in which case it would return all trials where
trial.status == 'completed'
, or they can be more complex using mongodb-like syntax.
Research Iteration and Version Control
While doing your research your are very likely to iterate on ideas and change many things about your experiments. You could obviously change the code, but also the hyper-parameters you want to optimize or the algorithm you use to optimize them. In any other hyper-parameter optimization framework you would start optimization from scratch each time. With Oríon comes an Experiment Version Control (EVC) system which makes it possible to reuse results from your first experiment in a given project to the current one. This means a new experiment could pre-train on all prior data resulting in a much more efficient optimization algorithm. Another advantage of the EVC system is that it provides you a systematic way to organize your research and the possibility to go back in time and compare the evolution of performance throughout your research.
The EVC system is certainly the part of Oríon with the steepest learning curve. We tried to make it such that you get help at any step and are provided with many hints so that you can learn while using it, without looking at the documentation. If you have any recommendation on how to improve those hints and how to make the EVC system more intuitive, please do not hesitate to contact us. You are very welcome to start a new thread on our Discourse.
To continue with our examples from pytorch-mnist, suppose we decide at some point we would like to
also optimize the momentum
.
$ orion hunt -n orion-tutorial ./main.py --lr~'loguniform(1e-5, 1.0)' --momentum~'uniform(0, 1)'
This cannot be the same as the experiment orion-tutorial
since the space of optimization is now
different. Such a call will trigger an experiment branching, meaning that a new experiment will
be created which points to the previous one, the one without momentum in this case.
Welcome to Orion's experiment branching interactive conflicts resolver
-----------------------------------------------------------------------
If you are unfamiliar with this process, you can type `help` to print the help message.
You can also type `abort` or `(q)uit` at any moment to quit without saving.
Remaining conflicts:
Experiment name 'orion-tutorial' already exist for user 'bouthilx'
New momentum
(orion)
You should think of it like a git status
. It tells you want changed and what you did not commit
yet. If you hit tab twice, you will see all possible commands. You can enter h or help for more
information about each command. In this case we will first add momentum
. You can enter add
and then hit tab twice. Oríon will detect any possible hyper-parameter that you could add and
autocomplete it. Since we only have momentum
in this case, it will be fully autocompleted. If
you hit tab twice again, the option --default-value
will be added to the line, with which you
can set a default-value for the momentum. If you only enter add momentum
, the new experiment
won't be able to fetch trials from the parent experiment, because it cannot know what was the
implicit value of momentum
on those trials. If you know there was a default value
for momentum
, you should tell so with --default-value
.
(orion) add momentum --default-value 0
TIP: You can use the '~+' marker in place of the usual ~ with the command-line to solve this
conflict automatically.
Ex: -x~+uniform(0,1)
Resolutions:
momentum~+uniform(0, 1, default_value=0.0)
Remaining conflicts:
Experiment name 'orion-tutorial' already exist for user 'bouthilx'
(orion)
As you can see, when resolving the conflicts with the prompt, Oríon will always tell you how you could have resolved the conflict directly in commandline. If we follow the advice, we would change our commandline like this.
$ orion hunt -n orion-tutorial ./main.py --lr~'loguniform(1e-5, 1.0)' --momentum~+'uniform(0, 1)'
Let's look back at the prompt above. Following the resolution of momentum
conflict we see that it is now
marked as resolved in the Resolutions list, while the experiment name is still marked as a
conflict. Notice that the prior distribution is slightly different than the one specified in
commandline. This is because we added a default value inside the prompt. Notice also that the
resolution is marked as how you would resolve this conflict in commandline. There is hints
everywhere to help you learn without looking at the documentation.
Now for the experiment name conflict. Remember that experiment names must be unique, that means that
when an experiment branching occur we need to give a new name to the child experiment. You can do so
with the command name
. If you hit tab twice with name
, Oríon will auto-complete with all
experiment names in the current project. This makes it easy to autocomplete an experiment name and
simply append some version number like 1.2
at the end. Let's add -with-momentum
in our case.
(orion) add orion-tutorial-with-momentum
TIP: You can use the '-b' or '--branch' command-line argument to automate the naming process.
Resolutions:
--branch orion-tutorial-with-momentum
momentum~+uniform(0, 1, default_value=0.0)
Hooray, there is no more conflicts!
You can enter 'commit' to leave this prompt and register the new branch
(orion)
Again Oríon will tell you how you can resolve an experiment name conflict in command-line to avoid the prompt, and the resolution will be marked accordingly.
$ orion hunt -n orion-tutorial -b orion-tutorial-with-momentum ./main.py --lr~'loguniform(1e-5, 1.0)' --momentum~+'uniform(0, 1)'
You can execute again this branched experiment by reusing the same commandline but replacing the new
experiment name orion-tutorial-with-momentum
.
$ orion hunt -n orion-tutorial-with-momentum ./main.py --lr~'loguniform(1e-5, 1.0)' --momentum~'uniform(0, 1)'
Or as always by only specifying the experiment name.
$ orion hunt -n orion-tutorial-with-momentum
If you are unhappy with some resolutions, you can type reset
and hit tab twice. Oríon will
offer autocompletions of the possible resolutions to reset.
(orion) reset '
'--branch orion-tutorial-with-momentum'
'momentum~+uniform(0, 1, default_value=0.0)'
(orion) reset '--branch orion-tutorial-with-momentum'
Resolutions:
momentum~+uniform(0, 1, default_value=0.0)
Remaining conflicts:
Experiment name 'orion-tutorial' already exist for user 'bouthilx'
(orion)
Once you are done, you can enter commit
and the branched experiment will be register and will
begin execution.
Note that all of this can be partially avoided using the option --auto-resolution
in commandline
or auto
in the interactive prompt. This will automatically resolve any conflict related to
hyper-parameters and algorithms. For now, Oríon cannot solve automatically experiment name
conflicts, code conflicts, command-line conflicts and configuration file conflicts.
$ orion hunt --auto-resolution -n orion-tutorial ./main.py --lr~'loguniform(1e-5, 1.0)' --momentum~'uniform(0, 1)'
Source of conflicts
- Code modification (not yet release, candidate for v0.1.1)
- Commandline modification
- Script configuration file modification
- Optimization space modification (new hyper-parameters or change of prior distribution)
- Algorithm configuration modification
Iterative Results
You can retrieve results from different experiments in the same project using
the Experiment Version Control (EVC) system. The only difference
with ExperimentBuilder
is that EVCBuilder
will connect the experiment
to the EVC system, accessible through the node
attribute.
import pprint
from orion.core.io.evc_builder import EVCBuilder
experiment = EVCBuilder().build_view_from(
{"name": "orion-tutorial-with-momentum"})
print(experiment.name)
pprint.pprint(experiment.stats)
parent_experiment = experiment.node.parent.item
print(parent_experiment.name)
pprint.pprint(parent_experiment.stats)
for child in experiment.node.children:
child_experiment = child.item
print(child_experiment.name)
pprint.pprint(child_experiment.stats)
Deployment
To run on Mila, you can simply call your script with Oríon as described in the sections above. Here's an example for a bash script to run with sbatch.
#!/usr/bin/env bash
export HOME=`getent passwd $USER | cut -d':' -f6`
source ~/.bashrc
export PYTHONUNBUFFERED=1
echo Running pwdon $HOSTNAME
workon orion.tutorial
orion hunt -n orion-tutorial ./main.py --lr~'loguniform(1e-5, 1.0)'
# Or simply `orion hunt -n orion-tutorial` if already registered
You can launch as many times as you want the same experiment. Each time you execute an experiment, it creates internally a worker which synchronizes with the others through the database. You don't need to setup any master node for this.
Setup on Cedar
TODO
Plugins
Oríon is built to be highly modular, so that algorithms and database backends can be extended by installing plugins. So far there is only one plugin supported, which is a Bayesian optimizer algorithm. We won't go into the details of how to develop new algorithm or database backends during this tutorial+hackaton but if you are interested please do not hesitate to contact us and we will be glad to help.
Algorithms and database cannot be defined in commandline so you need to set
them in the configuration file. We recommend to configure the database in global
configuration files since the database are not likely to change from one experiment to
another. For algorithms however you should probably favor local configuration
files which are specified to Oríon with option --config
.
Bayesian Optimizer
Suppose we define the file bayes.yaml
as this
name: orion-tutorial-with-bayes
algorithms: BayesianOptimizer
Then with call orion hunt
with the configuration file.
$ orion hunt --config bayes.yaml ./script.sh --lr~'loguniform(1e-5, 1.0)'
And voilà, we have a Bayesian optimizer sampling learning-rate values to optimize the error rate.
Wait, what if?
There is a plethora of possible workflows and we obviously only presented one such example in this tutorial. Because of that, many of you would be probably tempted to ask Wait, what if?
Wait, what if I use a configuration file for my hyper-parameters?
You can use configuration files using the keywords 'orion~dist(*args, **kwargs)'
. Oríon supports any
text based configuration file. Note that for now Oríon is a bit picky and
requires you to define the configuration file by passing it with
--config=file_path
where the =
matters.
Here is an example with yaml
lr: 'orion~loguniform(1e-5, 1.0)'
model:
activation: 'orion~choices(['sigmoid', 'relu'])'
hiddens: 'orion~randint(100, 1000)'
Here is another example with json
{
"lr": "orion~loguniform(1e-5, 1.0)"
"model": {
"activation": "orion~choices(['sigmoid', 'relu'])"
"hiddens": "orion~randint(100, 1000)"
}
}
And here is an example with python! Note that for other files than for json and yaml, the keywords
are defined as hp_name~dist(*args, **kwargs)
. Also, note that the code does not execute as is,
but once Oríon makes the substitution it will.
def my_config():
lr = lr~loguniform(1e-5, 1.0)
activations = model/activations~choices(['sigmoid', 'relu'])
nhiddens = model/hiddens~randint(100, 1000)
layers = []
for layer in range(model/nlayers~randint(3, 10)):
nhiddens /= 2
layers.append(nhiddens)
return lr, layers
Oríon could generate a script like this one for instance.
def my_config():
lr = 0.001
activations = 'relu'
nhiddens = 100
layers = []
for layer in range(4):
nhiddens /= 2
layers.append(nhiddens)
return lr, layers
Wait, what if I execute from bash script, not python?
If you pass your arguments to the script, something like
python somescript.py $@
Then Oríon can work flawlessly, because it only interacts with the arguments of the script.
$ orion hunt -n orion-tutorial ./script.sh --lr~'loguniform(1e-5, 1.0)' --momentum~+'uniform(0, 1)'
If you define the arguments within the script, then you would need to call orion from within the script like we did for sbatch in section Deployment.
Wait, what if I'm not using python?
Wait, really?
Suppose you do have a configuration file for the arguments or use command line arguments. Then
the only thing you need to do is to get the value of the environment variable ORION_RESULTS_PATH
and write to it a json file following this architecture.
[
{
"name": "my_objective_function",
"type": "objective",
"value": 0.0
}
]
This is basically what is doing the helper function orion.client.report_results
.
Wait, what if I want to add points to the search?
If you are asking because you intend to do grid search, then you should take a pause and think about why you would want to do grid search. Grid search is highly inefficient, even random search is better. Best solution if you want broad analysis is to rather use an optimization algorithm to focus on regions of hyper-parameter space where it affects performance and afterwards use an estimator on top of the results to interpolate performance. This is current practice in literature such as hyper-parameter importance analysis.
Otherwise, adding points based on our prior knowledge of good hyper-parameter values is a good
way to warm-start an optimization algorithm. To do so, you can use the command insert
following the same template as for hunt
but using specific values rather than prior
distribution. Note that for now insert
only supports argument specifications using =
sign.
It will fail otherwise.
$ orion insert -n orion-tutorial ./script.sh --lr=0.01 --momentum=0.9
Suppose you have many hyper-parameters and you would like to specify just a few when adding points manually. You can do so if the hyper-parameters you ignore have default values defined in the experiment.
$ orion hunt -n orion-tutorial ./script.sh --lr~'loguniform(1e-5, 1.0)' --momentum~'uniform(0, 1, default_value=0)'
$ orion insert -n orion-tutorial ./script.sh --lr=0.01
When points are added manually, they are saved in the database and may be picked up for execution in
a latter call to orion hunt -n orion-tutorial
. The command insert
does not execute trials.
Wait, what if I use python2.7?
C'mon, python2.7 is 8 years old. Install python3.6+ 😉
Still hesitating? Do you know numpy is planning to drop full support for python2 on January 1, 2019?
Wait, what if I use windows?
Well, maybe you should consider clicking here. 😁