Very long compilation times for llama2 with batch size 4
Closed this issue · 4 comments
dacorvo commented
With AWS Neuron SDK 2.14.1
, I am experiencing very long compilation times for batch_size = 4
with the llama2 7B model.
I am using the following configurations:
| | inf2.8xlarge | inf2.48xlarge |
|-------------|--------------|---------------|
| tp_degree | 2 | 24 |
| n_positions | 2048 | 2048 |
| amp | f16 | f16 |
With batch_size = 1, 2
it takes minutes to compile the model with the -O1
option, but with batch_size = 4
it lasts more than three hours.
awsilya commented
ack, asking compiler team to take a look
aws-donkrets commented
Hi dacorvo - We have root caused this issue with the llama2 batch 4 config. The fix will be in the next Neuron SDK release.
dacorvo commented
This is excellent news ! Thanks for the update.
jeffhataws commented
This issue was fixed in Neuron SDK release 2.15.