automl/CARL

Performance Deviations in Brax

Closed this issue · 2 comments

Comparing HalfCheetah in Brax (via gym.make and then wrapped as here: https://github.com/google/brax/blob/main/notebooks/training_torch.ipynb) vs in CARL makes a big difference in return even when the context is kept static. Do we do any unexpected reward normalization? Does the way we reset the env make a difference compared to theirs (as we actually update the simluation)?

Update: CARL itself (without the context update) modifies the performance, but only minimally. This is with the init switched to the gym version from the notebook for now. So likely this is not a CARL issue, but possibly an init or obs issue. Probably this is thus less a bug and more this env version just not matching gym brax 100%.

Seems like a lot of the performance difference was introduced later in my pipeline - turns out casting the actions/obs often through many wrappers is probably not optimal to keep everything consistent. So I guess we don't really need to do anything, we should just be aware that different inits + carl wrapping + x will produce different learning curves (not necessarily worse, just slightly different shapes.