miroblog/deep_rl_trader

I think there is a look-ahead bias

Opened this issue · 1 comments

Hi there, nice work.
However I think there is a look-ahead bias.
Every timestep, you get state and this state includes the current closeprice.
Then with step method you calculate profit as:

self.exit_price = self.closingPrice
self.reward += ((self.entry_price - self.exit_price)/self.exit_price + 1)*(1-self.fee)**2 - 1 # calculate reward

In this case you are using the same information that you already used to predict the next action.
What do you think about it?

state_n <- updateState()
action_n <- network(state_n)
reward_n <- compute_reward(action_n, state_n)


state_n_plus_1 <- updateState()
action_n_plus_1 <- network(state_n_plus_1)
reward_n_plus_1 <- compute_reward(action_n_plus_1, state_n_plus_1)

reward is computed from the current state, thus there is no lookahead bias.
Put it simply,
(e.g.) one decides to sell the stock based on past&current price and
if one does sell, then one would calculate the earning based on current price.