Keep static for the time being - shifting to a survey paper on RL

Table of contents

Deliverables for transfer on PhD

Quote

One should avoid solving more difficult intermediate problems when solving a target problem. Vladimir Vapnik, Statistical Learning Theory, 1998

Problems

Related to stochastic optimal control - Can do model free with simulations RL and Deep RL

  1. Merton Problem (Portfolio and consumption)
  2. Optimal Execution - Liquidation Problem
  3. Optimal Execution - Limit Order Placement
  4. Optimal Stopping and Control
  5. Optimal Execution for statistical arbitrage
  6. Optimal execution targetting volume
  7. Market Making problems
  8. Pairs trading - optimal entry/ exit
  9. Multi-period parametric policies (Brandt in mult-period)
  10. Optimal hedging of derivatives with path dependency (JPM - explore the model and more, when is it better than monte carlo and greeks).

Simulation

Stochastic optimal control assumes a model, simulations assume some knowledge of the world (say monte carlo), alternative and more robust simulation methods ? (for example GAN's for time series).

Key Ideas

We are looking to allocate to assets or strategies in a manner that is better than the current state of the art and to get RL working in real world finance. Reinforcement learning is a method for solving MDP's in a model free fashion. There are many MDP problems in finance and a whole mathematical methodology such as stochastic optimal control. Applications are myriad and range from investment/ consumption decisions, derivative hedging, algorithmic trading and inventory management. Solutions may have particular value when there is path dependency on an agent's decisions into the future.

In the derivative hedging method of finance, problems are usually solved in a step-wise fashion...often by calculating or adjusting the greeks, or in more awkward cases by monte carlo methods. Recent paper's hint at the ability to directly learn a hedging strategy in a greek free fashion from a simulation of the environment. In other words rather than a 2 step process - model the environment, solve the model, we can go straight from simulation to hedging, including where there are difficult real life problems such as transaction costs and path dependency and indeed complex risk adjusted functions of our final distribution of returns that we wish to maximise.

Allocation decisions within Finance lie within a most difficult environment. It is partially observed, noisy, and non-stationary, there may be outliers and regimes. Time also plays a key role and decisions again may have long term consequences. In contradistinction to standard methods we are not looking to apply single period prediction and then combine these predictions using an optimiser. This is akin to supervised learning, but in the real world our actions may have long term effects and indeed actions taken by our agents may be reacted against by the environment.

Most standard methods are single period and represent a two stage process, this involves two sets of parameters and forecast error is not utility, so we may even be optimising the wrong target. Other works give up upon some of our ability to predict and are thus more heuristic but more practical methods for allocation decisions, albeit pessimistic.

An information bottleneck is created between the supervised forecast error minimisation and the subsequent forecasts which are then used by an optimiser (some argue that this also serves as an error maximiser and indeed has its own parameters to be found). Given the noise inherent within finance and the fact that predictions are either very weak or indeed only exist for small windows of time then this makes the two stage process even more problematic.

Research has been created to address the two stage parameter estimation - Brandt and this enable a more aligned target. However most current academic work applying RL to allocation decisions is either on a very small scale or ignores basic practical realities of markets (such as transaction costs). Moody et al. appear to have been the earliest to understand these issues and attempt to have one set of parameters, a single utility, include transaction costs and directly map from inputs to actions (rather than predict then optimise).

The Moody work was nearly 20 years ago and indeed he left academic in 2003 to set up a successful hedge fund (which continues to be successful).

My goal is to advance this work using the latest in deep reinforcement learning (and potentially deep learning). The goal is to examine the state of the art, and advance it - particularly with a view to practicality, it is the author’s view that the current gap between the state of the art in RL in academia but applied within finance remains impractical. And in both parts of the research I am examining multi-period, path dependent decision making in difficult environments in a direct rather than 2 stage indirect fashion.

It should also be noted that explainability and sensitivity analysis is important in finance, black boxes are not widely trusted and indeed legally there may be cases where explainability is forced. I propose to also examine RL methods within this domain where sensitivity analysis and explainability is enabled.

A further question is if we are seeking to move directly from inputs to actions which in this case will be allocation weights with an objective of maximising say some long run utility, then there are practical questions as regards transaction costs, sparsity and indeed including practical constraints such as a draw down constraint. Also there are questions as regards throwing noisy time series into an RL agent and the best way to do this, for example if we go ‘deep’ do autoencoders have a part to play and should we be seeking to induce sparsity in our agent’s allocations?

Note that allocation problems, may in some cases be reduced to single state bandit problems and note that sometimes a poor model of the environment may be known and the agent may possibly be able to bootstrap from here. Allocations may be to experts, assets, or indeed strategies.

Books

Deep Hedging

RL

Machine-Learning-Asset-Allocation

DL Online and Bandits

Classic Portfolio Selection Materials

Machine Learning based Portfolio Selection

Canonical Correlation Analysis

Sample Efficient