State and Return preds input

The comment on the following line and the line after says that the return and state predictions are output using both the state and action as inputs. Although the equation only seems to use the action information (index 2). Am I missing something or is there some ambiguity? I know that it won't affect the learning since we are only using the action predictions.

decision-transformer/gym/decision_transformer/models/decision_transformer.py

Line 97 in f04280e

    
           return_preds = self.predict_return(x[:,2])  # predict next return given state and action

See #5: it uses all the information up to and including the latest action token.