question about `position_ids`

Question

question about `position_ids`

Closed this issue 9 months ago · 5 comments

hxs91 commented 10 months ago

Hello, I noticed the position_ids always is None in the released code, but the model still works well. why? thanks.

Answer 1 · 2024-03-14T01:38:20.000Z

Hey! Thank you for your question. Positional embeddings depend on the architecture:

For OPT we override the absolute positional embeddings to remove positional embeddings for the summary embeddings/vectors (see here https://github.com/princeton-nlp/AutoCompressors/blob/main/auto_compressor.py#L281).
For Llama, we found that relative RoPE embedidngs work well for a small number of additional summary vectors and don't add special treatment of their positional id (as mentioned in the paper). Unfortunately, this hurts extrapolation to much longer sequence lengths than training.

Answer 2 · 2024-03-25T01:24:07.000Z

Closing this due to inactivity. Feel free to re-open!

Answer 3 · 2024-10-28T07:04:50.000Z

Hey! Thank you for your question. Positional embeddings depend on the architecture:

For OPT we override the absolute positional embeddings to remove positional embeddings for the summary embeddings/vectors (see here https://github.com/princeton-nlp/AutoCompressors/blob/main/auto_compressor.py#L281).

For Llama, we found that relative RoPE embedidngs work well for a small number of additional summary vectors and don't add special treatment of their positional id (as mentioned in the paper). Unfortunately, this hurts extrapolation to much longer sequence lengths than training.

The override AutoCompressorMixin.forward() seems remove position_ids and there is no default initialization of position_ids before decoding. Will RoPE still be calculated in somewhere, or it is just be abandoned for all inputs？

Answer 4 · 2024-10-28T14:37:43.000Z

The code doesn't support explicit position_ids arguments but the underlying model code can use the tensor shape of the inputs to infer the position ids, so RoPE is still calculated.

Answer 5 · 2024-10-28T14:42:00.000Z

The code doesn't support explicit position_ids arguments but the underlying model code can use the tensor shape of the inputs to infer the position ids, so RoPE is still calculated.

OK, I see the source code of LLama model and find self.model() will initialize positional embedding when position_ids==None. Thanks for your reply.