A tool for optimizing RL policy modules based on random search (https://arxiv.org/abs/1803.07055)
It wrapps any Pytorch's module for a Reinforcement Learning policy into an optimizer that updates its parameter in order to maximize a reward.
policy = nn.Linear(num_inputs, num_outputs, bias=True)
policy.weight.data.fill_(0)
policy.bias.data.fill_(0)
pso = PSO(policy, lr=0.05, std=0.02, b=8, n_directions=8)
pso.sample()
action = pso.evaluate(state, direction=0, side="left")
(direction = index of explored direction, side = positive or negative perturbation)
pso.reward(reward, direction=0, side="left")
pso.update()
See example.py for a suggestion of augmented random search implementation.