Very long compilation times for llama2 with batch size 4

Question

Very long compilation times for llama2 with batch size 4

Closed this issue a year ago · 4 comments

With AWS Neuron SDK 2.14.1, I am experiencing very long compilation times for batch_size = 4 with the llama2 7B model.

I am using the following configurations:

|             | inf2.8xlarge | inf2.48xlarge |
|-------------|--------------|---------------|
| tp_degree   | 2            | 24            |
| n_positions | 2048         | 2048          |
| amp         | f16          | f16           |

With batch_size = 1, 2 it takes minutes to compile the model with the -O1 option, but with batch_size = 4 it lasts more than three hours.

Answer 1 · 2023-09-29T14:07:43.000Z

ack, asking compiler team to take a look

Answer 2 · 2023-10-12T06:18:02.000Z

Hi dacorvo - We have root caused this issue with the llama2 batch 4 config. The fix will be in the next Neuron SDK release.

Answer 3 · 2023-10-12T15:29:24.000Z

This is excellent news ! Thanks for the update.

Answer 4 · 2023-11-13T22:13:49.000Z

This issue was fixed in Neuron SDK release 2.15.