kzl/decision-transformer

State and Return preds input

backpropper opened this issue · 1 comments

The comment on the following line and the line after says that the return and state predictions are output using both the state and action as inputs. Although the equation only seems to use the action information (index 2). Am I missing something or is there some ambiguity? I know that it won't affect the learning since we are only using the action predictions.

return_preds = self.predict_return(x[:,2]) # predict next return given state and action

kzl commented

See #5: it uses all the information up to and including the latest action token.