valthom/pytorch-trpo

PyTorch implementation of Trust Region Policy Optimization

PythonMIT

PyTorch implementation of TRPO

This is a PyTorch implementation of "Trust Region Policy Optimization (TRPO)".

This is code mostly ported from original implementation by John Schulman. In contrast to another implementation of TRPO in PyTorch, this implementation uses exact Hessian-vector product instead of finite differences approximation.

Contributions

Contributions are very welcome. If you know how to make this code better, don't hesitate to send a pull request.

Usage

python main.py --env-name "Reacher-v1"

Results

More or less similar to the original code. Coming soon.

Todo

Plots.
Collect data in multiple threads.