Gradient-based hyperparameter optimization package based on TensorFlow
This is the new package that implements the algorithms presented in the paper
Forward and Reverse Gradient-Based Hyperparameter Optimization. For the older package see RFHO. FAR-HO features simplified interfaces, additional
capabilities and a tighter integration with tensorflow
.
- Reverse-HG, generalization of algorithms presented in Domke [2012] and MacLaurin et Al. [2015] (without reversable dynamics and "reversable dtype")
- Forward-HG
- Online versions of the two previous algorithms: Real-Time HO (RTHO) and Truncated-Reverse HO (TRHO)
The first two algorithms compute, with different procedures, the gradient of a validation error with respect to the hyperparameters - i.e. the hypergradient - while the last, based on Forward-HG, performs "real time" (i.e. at training time) hyperparameter updates.
These algorithms are useful also in a learning-to learn context where parameters of various meta-learners effectively play the role of hyperparamters, as explained here in the work A Bridge Between Hyperparameter Optimization and Learning-to-learn.
Clone the repository and run setup script.
git clone git clone https://github.com/lucfra/FAR-HO.git
cd FAR-HO
python setup.py install
Beside "usual" packages (numpy
), FAR-HO is built upon tensorflow
. Some examples depend on the package experimet_manager
Please note that required packages will not be installed automatically.
Aim of this package is to implement and develop gradient-based hyperparameter optimization (HO) techniques in TensorFlow, thus making them readily applicable to deep learning systems. This optimization techniques find also natural applications in the field of learning-to-learn. Feel free to issues comments, suggestions and feedbacks! You can email me at luca.franceschi@iit.it .
- Self contained example on MNIST with
ReverseHG
for the optimization of initial starting point (inital weights), weights of each example and learning rate. - Coming soon: expamples of application of online HO algorithms.
- Coming soon: What you can and cannot do with this package.
- Hyper-representation and related notebook: an example in the context of learning-to-learn. In this case the hyperparameters are some of the weights of a convolutional neural network (plus the learning rate!). The idea is to learn a cross-episode shared representation by explicitly minimizing the mean generalization error over meta-training tasks. See A bridge between hyperparameter optimization and learning-to-Learn presentied at Workshop on meta-learning. Note: for the moment, for running the code for this experiment you need to install the package https://github.com/lucfra/ExperimentManager for data management and statistics recording.
- See also these experiment package
- Create a model as you prefer1 with TensorFlow
- Create the hyperparameters you wish to optimize2 with the function
get_hyperparameter
(which could be also variables of your model) - Define an inner objective (e.g. a training error) and an outer objective (e.g. a validation error) as scalar
tensorflow.Tensor
- Create an instance of
HyperOptimizer
after choosing an hyper-gradient computation algorithm amongForwardHG
andReverseHG
(see next section) - Call the function
HyperOptimizer.minimize
specifying passing the outer and inner objectives, as well as an optimizer for the outer problem (which can be any optimizer formtensorflow
) and an optimizer for the inner problem (which must be an optimizer contained in this package; at the moment gradient descent, gradient descent with momentum and Adam algorithms are available, but it is quite straightforward to implement other optimizers) - Execute
HyperOptimizer.run(T, ...)
function inside atensorflow.Session
, optimize parameters and perform a step of optimization of hyperparameter optimization.
import far_ho as far
import tensorflow as tf
model = create_model(...)
lambda1 = far.get_hyperparameter('lambda1', ...)
lambda1 = far.get_hyperparameter('lambda2', ...)
io, oo = create_objective(...)
inner_problem_optimizer = far.GradientDescentOptimizer(lr=far.get_hyperparameter('lr', 0.1))
outer_problem_optimizer = tf.train.AdamOptimizer()
farho = far.HyperOptimizer()
ho_step = farho.minimize(oo, outer_problem_optimizer,
io, inner_problem_optimizer)
T = 100
with tf.Session().as_default():
for _ in range(100):
ho_step(T)
1 This is gradient-based optimization and for the computation
of the hyper-gradients second order derivatives of the training error show up
(even tough no Hessian matrix is explicitly computed at any time);
therefore, all the ops used
in the model should have a second order derivative registered in tensorflow
.
2 For the hyper-gradients to make sense, hyperparameters should be
real-valued. Moreover, while ReverseHG
should handle generic r-rank tensor
hyperparameters, ForwardHG
requires scalars hyperparameters. Use the keyword argument scalar=True
in get_hyperparameter
for obtaining a scalr splitting of a general tensor.
Forward and Reverse-HG compute the same hypergradient, so the choice is a matter of time versus memory!
The online versions of the algorithms can dramatically speed-up the optimization.
The objective is to minimize some validation function E with respect to
a vector of hyperparameters lambda. The validation error depends on the model output and thus
on the model parameters w.
w should be a minimizer of the training error and the hyperparameter optimization
problem can be naturally formulated as a bilevel optimization problem.
Since these problems are rather hard to tackle, we
explicitly take into account the learning dynamics used to obtain the model
parameters (e.g. you can think about stochastic gradient descent with momentum),
and we formulate
HO as a constrained optimization problem. See the paper for details.
- Simplified interface: optimize paramters and hyperparamters with "just" a call of
far.HyperOptimizer.minimize
, create variables designed as hyperparameters withfar.get_hyperparameter
, no more need to vectorize the model weights,far.optimizers
only need to specify the update as a list of pairs (v, v_{k+1}) - Additional capabilities: set an initalizaiton dynamics and optimize the (dsitribution) of initial weights, allowed explicit dependence of the outer objective w.r.t. hyperparameters, support for multiple outer objectives and multiple inner problems (episode batching, average the sampling from distributions, ...)
- Tighter integration: collections for hyperparameters and hypergradients (use
far.GraphKeys
), use out-of-the-box models (no need to vectorize the model), use any TensorFlow optimizer for the outer objective (validation error) - Lighter package: only code for implementing the algorithms and running the examples
- Forward hypergradient methods have been reimplemented with a double reverse mode trick, thanks to Jamie Townsend.
@InProceedings{pmlr-v70-franceschi17a,
title = {Forward and Reverse Gradient-Based Hyperparameter Optimization},
author = {Luca Franceschi and Michele Donini and Paolo Frasconi and Massimiliano Pontil},
booktitle = {Proceedings of the 34th International Conference on Machine Learning},
pages = {1165--1173},
year = {2017},
volume = {70},
series = {Proceedings of Machine Learning Research},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v70/franceschi17a/franceschi17a.pdf},
}
For the work on learning-to-learn
@article{franceschi2017bridge,
title={A Bridge Between Hyperparameter Optimization and Larning-to-learn},
author={Franceschi, Luca and Frasconi, Paolo and Donini, Michele and Pontil, Massimiliano},
journal={arXiv preprint arXiv:1712.06283},
year={2017}
}