flowersteam/explauto

Tutorial on using a lower-level stochastic policy optimizer in goal exploration processes

Closed this issue · 1 comments

In the general formulation of the goal exploration process (as for example explained in Baranes and Oudeyer, 2013, section 2 and algorithm 1), once a parameterized task is selected, a budget of time is allocated to a stochastic policy optimizer (e.g. L-BFGS, CMA-ES, Pi-BB, simple genetic algo, etc ...) to improve the best currently known parameterized policy for this task (and the optimization is initialized with this best current solutions as computed by the inverse model, which allows transfer learning across parameterized tasks and can be as simple as finding a nearest neighbor or consist in launching another stochastic optimization on the surrogate forward model, e.g. CMA-ES+LWLR). During this budget of time, each evaluation of parameters of the higher level stochastic optimizer is executed by the robot (rollout), and the data is used to update the forward and inverse models.

However, in the current tutorials, only specific forms of this general goal exploration process are shown and explained (most often, this is the basic default method consisting in adding random noise to the best current solution). It would be very useful to have a tutorial on (random and active) goal exploration processes that include the use of a stochastic policy optimizer (with the particular algorithm, e.g. L-BFGS or CMA-ES or Pi-BB, provided as a parameter).

Active Learning of Inverse Models with Intrinsically Motivated Goal Exploration in Robots
Baranes, A., Oudeyer, P-Y. (2013)
Robotics and Autonomous Systems, 61(1), pp. 49-73
http://www.pyoudeyer.com/ActiveGoalExploration-RAS-2013.pdf