We are trying to solve a very simplified version of the classic Portfolio Optimization Problem, so that it can be within the scope of Reinforcement learning[Q-learning].
- A stock portfolio will consist of exactly 5 stocks[A,B,C,D,E] at any given time.
- Only one transaction of exchange(selling one stock and buying another) is allowed per day.
- Only 5% of one stock can be exchanged with another stock on any given day.
- To reduce the infinite state space concern, we have classified the state as follows:
- We have represented the state of the stock in a string of 5 alphabets each alphabet is one from letters A,B,C,...R,S,T,U
- Based on the percentage of stock in the portfolio in the current state, a letter is assigned to represent the state of that stock in the overall state of the portfolio.
- The letter 'A' corresponds to a stock with 0% share in the portfolio, B corresponds to values in the set (0-5]%, C corresponds to (5-10] and so on till U which corresponds to (95,100].
- e.g. If a stock portfolio in share of percentages looks something like this :[9,24,44,20,3]; then it will be represented in the 5-character string representation as CFJEB. Similarly [19,24,44,7,6] is EFJCC and [95,1,1,2,1] is TBBBB.
- Thus state transition is just a change of alphabets in the string of 5 characters.
- The possible actions from the current state will be obtained by checking all possible 5% exchanges that can happen on the stock portfolio.
Goal of the system is to maximize the value of the stock portfolio over a period of 5 years(4 years of exploration + 1 year of exploitation)
The do-nothing benchmark
- If the stock portfolio is kept aside for the exploitation period, then the system should outperform the price rise of those stocks in that period.
- For all the stocks, the do-nothing benchmark is calculated by giving equal weightage to all stocks(i.e. 20% -> EEEEE) and then allowing the stock value to increase over the period.
- Aditya Masurkar
- Sachin Haldavanekar