Running Llama3 Returns Tensor Allocate Status 2
pedrohernandezgeladocma opened this issue · 3 comments
pedrohernandezgeladocma commented
When running the notebook for inference using Llama3
import time
import torch
from transformers import AutoTokenizer
from transformers_neuronx import LlamaForSampling
from transformers import LlamaForCausalLM, LlamaTokenizer, PreTrainedTokenizerFast
from transformers_neuronx import LlamaForSampling, NeuronConfig, GQA, QuantizationConfig
from transformers_neuronx.config import GenerationConfig
# Set this to the Hugging Face model ID
model_id = "meta-llama/Meta-Llama-3-8B"
neuron_config = NeuronConfig(
on_device_embedding=False,
attention_layout='BSH',
fuse_qkv=True,
group_query_attention=GQA.REPLICATED_HEADS,
quant=QuantizationConfig(),
on_device_generation=GenerationConfig(do_sample=True)
)
# load meta-llama/Llama-3-8B to the NeuronCores with 24-way tensor parallelism and run compilation
neuron_model = LlamaForSampling.from_pretrained(model_id, neuron_config=neuron_config, batch_size=1, tp_degree=24, amp='f16', n_positions=4096)
neuron_model.to_neuron()
There is a return error code of:
nrt_tensor_allocate status=2 message="Invalid"
Edit: instance type -> inf2.8xlarge Ubuntu 22 AMI
No dependencies issues as far as I understand but cannot trace the error beyond the function, also no references to the error on Troubleshooting
aws-taylor commented
Thanks @pedrohernandezgeladocma, we're taking a look.
pedrohernandezgeladocma commented
@aws-taylor I think this issue may be related to -> #749
aws-taylor commented
Hello @pedrohernandezgeladocma, we suspect the issue may be related to your instance type. Can you try again with a larger instance type? In particular, tp_degree=24
is too small for an inf2.8xlarge.