/relax_frwr_example

Example FRWR (PDDM) implementation with ReLAx

Primary LanguageJupyter Notebook

Example Filtering & Reward Weigthed Refinement implementation with ReLAx

This repository contains an implementation of filtering & reward weigthed refinement (FRWR aka PDDM) with ReLAx.

FRWR actor was trained on HalfCheetah-v2 Mujoco Gym environment for 50k env-steps.

The graph of average return vs training step is shown below (batch_size=5000):

frwr_training

The graph below shows actual rewards vs rewards fitted with environment model:

frwr_model_rews

Resulting Policy:

frwr_run.mp4