EDELWAC²E

Reinforcement Learning Agents for GAC²E through Hym with HaskTorch.

Setup

LibTorch is required, as per HaskTorch Documentation, and must be symlinked into this directory. Then source setenv in your shell.

For training, Hym must be up and running.

For tracking, mlflow and mlflow-hs must be installed.

$ source setenv
$ stack build

Usage

With default options

$ stack run

otherwise

$ stack exec -- edelwace-exe [options]

Usage: edelwace-exe [-l|--algorithm ALGORITHM] [-H|--host HOST] [-P|--port PORT]
                    [-i|--ace ID] [-p|--pdk PDK] [-v|--var VARIANT]
                    [-a|--act ACTIONS] [-o|--obs OBSERVATIONS] [-f|--path FILE]
                    [-T|--tracking-host HOST] [-R|--tracking-port PORT]
  GACE RL Trainer

Available options:
  -l,--algorithm ALGORITHM DRL Algorithm, one of sac, td3, ppo (default: "sac")
  -H,--host HOST           Hym server host address (default: "localhost")
  -P,--port PORT           Hym server port (default: "7009")
  -i,--ace ID              ACE OP ID (default: "op2")
  -p,--pdk PDK             ACE Backend (default: "xh035")
  -v,--var VARIANT         GACE Environment Variant (default: "0")
  -a,--act ACTIONS         Dimensions of Action Space (default: 10)
  -o,--obs OBSERVATIONS    Dimensions of Observation Space (default: 39)
  -f,--path FILE           Checkpoint File Path (default: "./models")
  -T,--tracking-host HOST  MLFlow tracking server host address
                           (default: "localhost")
  -R,--tracking-port PORT  MLFlow tracking server port (default: "5000")
  -h,--help                Show this help text

Dependencies

hasktorch
libtorch-ffi
mtl
wreq
aeson
optparse-applicative
mlflow-hs

Algorithms

Haddock is availbale.

Caution: Excessive use of Unicode and Strictness.

Soft Actor Critic (SAC)

Arxiv

Soft Actor Critic (SAC) Agent for continuous action space. Start with -l sac and -v 0 for continuous electrical design space.

It appears that state scaling / standardization makes things worse for SAC. The loss steadily increases and no learning occurs.

Proximal Policy Optimization (PPO)

Arxiv

Proximal Policy Optimization (PPO) Agent for discrete and continuous action spaces. Start with -l ppo and -v 2 for discrete electrical design space.

Dscrete PPO needs about ~4k steps before plateauing around an average reward of ~0.4. The area is way smaller than the target, while offset is not quite reached.

Twin Delayed Deep Deterministic Policy Gradient (TD3)

Arxiv

Twin Delayed Deep Deterministic Policy Gradient (TD3) Agent for continuous action space. Start with -l td3 and -v 0 for continuous electrical design space.

Prioritized Experience Replay (PER)

Arxiv

Only implemented in SAC and deactivated for the moment. To quote ERE Paper:

We show that SAC+PER can marginally improve the sample efficiency performance of SAC, but much less so than SAC+ERE.

Emphasizing Recent Experience (ERE)

Arxiv

...

Hindsight Experience Replay (HER)

Arxiv

...

Results

...

AugustUnderground/edelwace

EDELWAC²E

Setup

Usage

Dependencies

Algorithms

Soft Actor Critic (SAC)

Proximal Policy Optimization (PPO)

Twin Delayed Deep Deterministic Policy Gradient (TD3)

Prioritized Experience Replay (PER)

Emphasizing Recent Experience (ERE)

Hindsight Experience Replay (HER)

Results

TODO