question about `position_ids`
Closed this issue · 5 comments
Hello, I noticed the position_ids
always is None in the released code, but the model still works well. why? thanks.
Hey! Thank you for your question. Positional embeddings depend on the architecture:
- For OPT we override the absolute positional embeddings to remove positional embeddings for the summary embeddings/vectors (see here https://github.com/princeton-nlp/AutoCompressors/blob/main/auto_compressor.py#L281).
- For Llama, we found that relative RoPE embedidngs work well for a small number of additional summary vectors and don't add special treatment of their positional id (as mentioned in the paper). Unfortunately, this hurts extrapolation to much longer sequence lengths than training.
Closing this due to inactivity. Feel free to re-open!
Hey! Thank you for your question. Positional embeddings depend on the architecture:
- For OPT we override the absolute positional embeddings to remove positional embeddings for the summary embeddings/vectors (see here https://github.com/princeton-nlp/AutoCompressors/blob/main/auto_compressor.py#L281).
- For Llama, we found that relative RoPE embedidngs work well for a small number of additional summary vectors and don't add special treatment of their positional id (as mentioned in the paper). Unfortunately, this hurts extrapolation to much longer sequence lengths than training.
The override AutoCompressorMixin.forward()
seems remove position_ids
and there is no default initialization of position_ids
before decoding. Will RoPE still be calculated in somewhere, or it is just be abandoned for all inputs?
The code doesn't support explicit position_ids
arguments but the underlying model code can use the tensor shape of the inputs to infer the position ids, so RoPE is still calculated.
The code doesn't support explicit
position_ids
arguments but the underlying model code can use the tensor shape of the inputs to infer the position ids, so RoPE is still calculated.
OK, I see the source code of LLama model and find self.model()
will initialize positional embedding when position_ids==None
. Thanks for your reply.