tristandeleu/pytorch-maml-rl

Questions about the output files

Closed this issue · 4 comments

This question may be a bit silly but I cannot figure out what the output files mean and how to draw the figures on your paper. The result files consist of three files: tasks, train_returns, valid_returns. To gain the average returns, should I calculate the mean value of the "valid_returns"? What about the returns before update? Is it calculated by average the "train_returns"? Thank you so much if you can provide any help.

The array valid_returns contains the returns for the episodes sampled after adaptation. This array has size (meta_batch_size, fast_batch_size). Taking the average and standard error over the whole valid_returns array will give you numbers that correspond to the value for 1 gradient step (assuming you are doing 1 gradient step adaptation) from Figure 5 in the MAML paper. If you take the average and standard error over the second axis only (valid_returns.mean(1)), you can get the orange curve in Figure 1 in the Negative adaptation in MAML paper (and use the tasks list to know which task the values correspond to).

The array train_returns is an array with the same shape, containing the returns before any adaptation step. With the same statistics as above, it corresponds to 0 gradient steps in Figure 5 of the MAML paper and the blue curve in Figure 1 of the Negative adaptation in MAML paper.

Thanks a lot for your kind reply! That's really helpful. While I still have one more question to ask. In the python file maml_trpo, what is the role of the function " async def adapt(self, train_futures, first_order=None)" ? Cause I thought the adaptation process happens in the " multi_task sampler" file. Thanks again for your time and patience.

There is indeed adaptation in two different places:

  • In MultiTaskSampler there is adaptation to sample adapted ("validation") episodes. This adaptation step is only done to collect data used for optimization in MAMLTRPO later (that's why there is a first_order=True in the update_params function). We're not backpropagating through the gradient update here.
  • In MAMLTRPO there is the adapt function which does the exact same adaptation step (based on the "training" episodes), but this is for optimization only this time. Using this adapt function, we can then backpropagate through the gradient update. Unlike in MultiTaskSampler, we don't use these adapted parameters to sample "validation" episodes, because we already have them available thanks to MultiTaskSampler

So it looks like this is wasteful to do the adaptation twice (once for sampling episodes, another for optimization), but this allows complete decoupling of sampling and optimization, and makes the overall process significantly faster. An earlier version of this repo which had sampling and optimization entangled was 10x slower.

Got it, thank you so much!!!