Running Llama3 Returns Tensor Allocate Status 2

Question

Running Llama3 Returns Tensor Allocate Status 2

pedrohernandezgeladocma opened this issue 4 months ago · 3 comments

pedrohernandezgeladocma commented 4 months ago

When running the notebook for inference using Llama3

import time
import torch
from transformers import AutoTokenizer
from transformers_neuronx import LlamaForSampling
from transformers import LlamaForCausalLM, LlamaTokenizer, PreTrainedTokenizerFast
from transformers_neuronx import LlamaForSampling, NeuronConfig, GQA, QuantizationConfig
from transformers_neuronx.config import GenerationConfig 

# Set this to the Hugging Face model ID
model_id = "meta-llama/Meta-Llama-3-8B"

neuron_config = NeuronConfig(
                    on_device_embedding=False,
                    attention_layout='BSH',
                    fuse_qkv=True,
                    group_query_attention=GQA.REPLICATED_HEADS,
                    quant=QuantizationConfig(),
                    on_device_generation=GenerationConfig(do_sample=True)
              )

# load meta-llama/Llama-3-8B to the NeuronCores with 24-way tensor parallelism and run compilation
neuron_model = LlamaForSampling.from_pretrained(model_id, neuron_config=neuron_config, batch_size=1, tp_degree=24, amp='f16', n_positions=4096)
neuron_model.to_neuron()

There is a return error code of:

nrt_tensor_allocate status=2 message="Invalid"

Edit: instance type -> inf2.8xlarge Ubuntu 22 AMI

No dependencies issues as far as I understand but cannot trace the error beyond the function, also no references to the error on Troubleshooting

Answer 1 · 2024-05-22T14:53:43.000Z

Thanks @pedrohernandezgeladocma, we're taking a look.

Answer 2 · 2024-05-28T08:57:12.000Z

@aws-taylor I think this issue may be related to -> #749

Answer 3 · 2024-05-28T16:16:38.000Z

Hello @pedrohernandezgeladocma, we suspect the issue may be related to your instance type. Can you try again with a larger instance type? In particular, tp_degree=24 is too small for an inf2.8xlarge.